# Patent Citation Network

U.S. patent dataset is maintained by the [National Bureau of Economic Research](http://www.nber.org/). The data set spans 37 years (January 1, 1963 to December 30, 1999), and includes all the utility patents granted during that period, totaling 3,923,922 patents. The citation graph includes all citations made by patents granted between 1975 and 1999, totaling 16,522,438 citations. For the patents dataset there are 1,803,511 nodes for which we have no information about their citations (we only have the in-links).

The data was originally released by [NBER](http://www.nber.org/patents/). The dataset can be found at this link: [Stanford Patent Citation Network](https://snap.stanford.edu/data/cit-Patents.html)

## Import Packages

In [None]:
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
from random import randint

%matplotlib inline

## Import data

In [None]:
patent = pd.read_csv(
    "http://snap.stanford.edu/data/cit-Patents.txt.gz",
    compression="gzip",
    sep="\t",
    names=["start_node", "end_node"],
    skiprows=5
)

In [None]:
patent.head()

Unnamed: 0,start_node,end_node
0,3858241,1324234
1,3858241,3398406
2,3858241,3557384
3,3858241,3634889
4,3858242,1515701


In [None]:
patent.shape

(16518947, 2)

## Build Recommendation System

build a recommendation system based on Node2Vec with the patent information, remember to do a descriptive analysis and apply visualizations.

Also, remember that when making a recommendation about which patent should be associated with another, make a visualization of the recommendation.

In [1]:
!pip install node2vec

Collecting node2vec
  Downloading node2vec-0.4.6-py3-none-any.whl (7.0 kB)
Collecting networkx<3.0,>=2.5 (from node2vec)
  Downloading networkx-2.8.8-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: networkx, node2vec
  Attempting uninstall: networkx
    Found existing installation: networkx 3.1
    Uninstalling networkx-3.1:
      Successfully uninstalled networkx-3.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lida 0.0.10 requires fastapi, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
lida 0.0.10 requires python-multipart, which is not installed.
lida 0.0.10 requires uvicorn, which is not installed.[0m[31m
[0mSuccessfully installed networkx-2.8.8 node2vec-0.4.6


In [2]:
# Importa las bibliotecas necesarias
import pandas as pd
import networkx as nx
from node2vec import Node2Vec
import matplotlib.pyplot as plt
import random

In [5]:
patente = pd.read_csv(
    "http://snap.stanford.edu/data/cit-Patents.txt.gz",
    compression="gzip",
    sep="\t",
    names=["start_node", "end_node"],
    skiprows=5
)

KeyboardInterrupt: ignored

In [None]:
# Cargar los datos de la red de citaciones de patentes
patente = pd.read_csv(
    "http://snap.stanford.edu/data/cit-Patents.txt.gz",
    compression="gzip",
    sep="\t",
    names=["nodo_inicial", "nodo_final"],
    skiprows=5
)

In [4]:
# Tomar una muestra de un número aleatorio de filas de los datos
tamaño_muestra = random.randint(10, len(patente))  # Número aleatorio de filas
muestra_datos = patente.sample(n=tamaño_muestra, random_state=42)  # Tamaño de muestra aleatorio

NameError: ignored

In [None]:
# Paso 1: Eliminar filas duplicadas
muestra_datos = muestra_datos.drop_duplicates()

In [None]:
# Paso 2: Eliminar filas con valores faltantes
muestra_datos = muestra_datos.dropna()

In [None]:
# Paso 2: Eliminar filas con valores faltantes
muestra_datos = muestra_datos.dropna()

In [None]:
# Crear un grafo a partir del conjunto de datos completo
G_completo = nx.from_pandas_edgelist(patente, "nodo_inicial", "nodo_final")

In [None]:
# Crear un grafo a partir de la muestra de datos
G_muestra = nx.from_pandas_edgelist(muestra_datos, "nodo_inicial", "nodo_final")

In [None]:
# Visualización de la red de citaciones del conjunto de datos completo
plt.figure(figsize=(12, 8))
pos_completo = nx.spring_layout(G_completo)

# Gráfico de red del conjunto de datos completo
nx.draw(G_completo, pos_completo, with_labels=False, node_size=10, edge_color='g', node_color='b')
plt.title("Red de Citaciones de Patentes (Conjunto de Datos Completo)")
plt.show()

# Visualización de la red de citaciones de la muestra de datos
plt.figure(figsize=(12, 8))
pos_muestra = nx.spring_layout(G_muestra)

# Gráfico de red de la muestra de datos
nx.draw(G_muestra, pos_muestra, with_labels=False, node_size=10, edge_color='g', node_color='b')
plt.title("Red de Citaciones de Patentes (Muestra de Datos)")
plt.show()

# Entrenar el modelo Node2Vec en la muestra de datos
node2vec = Node2Vec(G_muestra, dimensions=64, walk_length=30, num_walks=200, workers=4)
model = node2vec.fit(window=10, min_count=1, batch_words=4)

# Paso 5: Elegir un nodo aleatorio de la muestra de datos
random_node = random.choice(muestra_datos['nodo_inicial'])

# Generar recomendaciones para el nodo seleccionado
recommendations = model.wv.most_similar(str(random_node), topn=5)
print(f"Recomendaciones para el nodo {random_node} en la muestra de datos:")
for rec in recommendations:
    print(rec)