# Obtener datos de la API

## Obtener una cuenta de Twitter Developer

- Crear una cuenta en <a href="https://twitter.com/">Twitter</a> o ingresar a una ya creada
- Solicitar una <a href="https://developer.twitter.com/en/portal/petition/use-case">cuenta de developer</a>  
<br/>
<img src="https://gitlab.com/andrea.navarro/inteligencia-artificial/-/raw/master/analisis-sentimiento/img/twitter_api_1.png" style="width:20%;float:left;border:1px solid black">
<img src="https://gitlab.com/andrea.navarro/inteligencia-artificial/-/raw/master/analisis-sentimiento/img/twitter_api_2.png" style="width:20%;float:left;border:1px solid black">
<img src="https://gitlab.com/andrea.navarro/inteligencia-artificial/-/raw/master/analisis-sentimiento/img/twitter_api_3.png" style="width:20%;float:left;border:1px solid black">
<img src="https://gitlab.com/andrea.navarro/inteligencia-artificial/-/raw/master/analisis-sentimiento/img/twitter_api_4.png" style="width:20%;float:left;border:1px solid black">

#### How will you use the Twitter API or Twitter Data?

I will search and filter tweets with specific hashtags in order to perform data mining and sentiment analysis practices. These tasks are part of the Artificial Intelligence course. The extracted data will not be used for any other purpose.

#### Are you planning to analyze Twitter data?

I will perform sentiment analysis of the content of the tweets and their geographical location. The type of content of each tweet will be evaluated (links, images, videos)


<br/><br/>
<img src="https://gitlab.com/andrea.navarro/inteligencia-artificial/-/raw/master/analisis-sentimiento/img/twitter_api_5.png" style="width:20%;float:left;border:1px solid black">
<img src="https://gitlab.com/andrea.navarro/inteligencia-artificial/-/raw/master/analisis-sentimiento/img/twitter_api_6.png" style="width:20%;float:left;border:1px solid black">


### Responder mail

Si Twitter envía un mail solicitando más información responder con el siguiente mensaje.

<code>
    I will search and filter tweets with specific hashtags in order to perform data mining and sentiment analysis practices. These tasks are part of the Artificial Intelligence course. The extracted data will not be used for any other purpose.
    I will perform sentiment analysis of the content of the tweets and their geographical location. The type of content of each tweet will be evaluated (links, images, videos)
    I will not be using the Tweeting, Retweeting, or liking content. I will only use the API to obtain tweets content.
    The content of the tweets will not be shown. The content will only be used to carry out data analysis exercises during the course.
</code>

## Crear aplicación

- Crear proyecto
- Crear aplicación dentro del proyecto
- Obtener y guardar claves (copiar todas las claves antes de continuar ya que no pueden ser accedidas más adelante)

<br/>
<img src="https://gitlab.com/andrea.navarro/inteligencia-artificial/-/raw/master/analisis-sentimiento/img/twitter_api_8.png" style="width:30%;float:left;border:1px solid black">
<img src="https://gitlab.com/andrea.navarro/inteligencia-artificial/-/raw/master/analisis-sentimiento/img/twitter_api_7.png" style="width:70%;float:left;border:1px solid black">


## Cargar Token en variables de entorno

 - Cargar el valor del token en un archivo .env
 <code>export 'BEARER_TOKEN'='valor del bearer token' </code>
 - Agregar el archivo .env dentro del .gitignore en caso de trabajar en repositorio

## Ejemplo de obtencion de informacion de un tweet

### Cargar valor del Token en la aplicación

In [1]:
import os
from dotenv import load_dotenv
# Cargar valores del archivo .env en las variables de entorno
load_dotenv()
# Cargar valor del token a variable
bearer_token = os.environ.get("BEARER_TOKEN")

### Definir consulta a la API

#### URL de la consulta

Definir la URL de acuerdo a los datos requeridos de acuerdo a la documentación de la <a href="https://developer.twitter.com/en/docs/twitter-api/api-reference-index">API</a>

In [2]:
url = "https://api.twitter.com/2/tweets/search/recent"

### Definir parámetros adicionales

Definr valores como el rango de fecha, hashtag, contenido y campos requeridos.

In [3]:
params = {
    'query': '#machinelearning -is:retweet',
    'tweet.fields':'created_at',
    'max_results':100
}

### Definir cabecera
La cabecera debe llevar el Token de autenticación para que la consulta sea autorizada


In [4]:
headers = {
    "Authorization": f"Bearer {bearer_token}",
    "User-Agent":"v2FullArchiveSearchPython"
} 

### Realizar consulta

In [5]:
import requests
response = requests.get(url, headers=headers, params=params)

# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
print("Datos:\n", response.json())

Datos:
 {'data': [{'created_at': '2021-10-04T03:00:13.000Z', 'id': '1444859875524231171', 'text': 'Learn! -  #ArtificialIntelligence &amp; #MachineLearning Panel Discussion with Corrado Iorizzo, #PhilipMorris International *** LIVE from the #ARCFORUM #EUROPE https://t.co/gVFU4PCBMf'}, {'created_at': '2021-10-04T03:00:06.000Z', 'id': '1444859848890408960', 'text': 'An All-Volunteer Deep Learning Army https://t.co/PJ0HcpBsMq #DL #AI #ML #DeepLearning  #ArtificialIntelligence #MachineLearning #ComputerVision #AutonomousVehicles #NeuroMorphic #Robotics'}, {'created_at': '2021-10-04T03:00:01.000Z', 'id': '1444859825934901250', 'text': 'Need a Job?\nSign up now https://t.co/rMErDK45VP\nNO MIDDLEMAN\n#Jobs #JobSearch #WorkFromHome #work #DataAnalytics #MachineLearning #Python #JavaScript #WomenWhoCode #Programming #Coding #100DaysofCode #DEVCommunity #gamedev #gamedevelopment #indiedev #IndieGameDev #Mobile #gamers https://t.co/FSHX6A5tI1'}, {'created_at': '2021-10-04T02:59:58.000Z', 'id': '1

### Formatear respuesta

Convertir respuesta en un dataframe de Pandas

In [6]:
import pandas as pd
df = pd.json_normalize(response.json()['data'])
df

Unnamed: 0,created_at,id,text
0,2021-10-04T03:00:13.000Z,1444859875524231171,Learn! - #ArtificialIntelligence &amp; #Machi...
1,2021-10-04T03:00:06.000Z,1444859848890408960,An All-Volunteer Deep Learning Army https://t....
2,2021-10-04T03:00:01.000Z,1444859825934901250,Need a Job?\nSign up now https://t.co/rMErDK45...
3,2021-10-04T02:59:58.000Z,1444859814455255048,#MobileGaming / #iOSGaming are probably the bi...
4,2021-10-04T02:58:42.000Z,1444859496937889796,"CMU Researchers Introduce ‘CatGym’, A Deep Rei..."
...,...,...,...
95,2021-10-04T02:07:03.000Z,1444846496516476929,DOWNLOAD Lauv x LANY - Mean It stripped #Wapba...
96,2021-10-04T02:07:01.000Z,1444846490665435144,DOWNLOAD Lauv - lonely [ft. Anne-Marie] #Wapba...
97,2021-10-04T02:06:08.000Z,1444846266895282181,Cybernetic #ArtificialIntelligence #learning #...
98,2021-10-04T02:05:08.000Z,1444846015652171778,#womenintech #django #nocode #javascript #gith...


# Ejercicios

 A partir de la documentación del endpoint <a href="https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent">Recent</a> y las opciones de <a href="https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query">query</a> obtener:

 - Una lista de las fechas y creación de los tweets realizados por el usuario @kdnuggets que contenga el hashtag #NLP

In [7]:
# Definimos el hashtag, que no sean retweets para no tener repetidos y tambien los generados por @kdnuggets
params = {
    'query': '#NLP -is:retweet from:kdnuggets',
    'tweet.fields':'created_at',
    'max_results':100
}

response = requests.get(url, headers=headers, params=params)

if response.status_code != 200:
    raise Exception(response.status_code, response.text)

df = pd.json_normalize(response.json()['data'])
df

Unnamed: 0,created_at,id,text
0,2021-09-28T14:14:03.000Z,1442855125337485318,.@BaiduResearch Releases PLATO-XL: World’s Fir...
1,2021-09-27T17:22:15.000Z,1442540100303327234,"Language, Vision and Deep Learning Models - Fr..."


- Una lista de los textos y nombres de usuario correspondientes a los tweets que contengan los hashtags #NLP y #MachineLearning que no sean retweets

In [8]:
# Definimos los hashtags y que no sean retweets para no tener repetidos
params = {
    'query': '#NLP #MachineLearning -is:retweet',
    'tweet.fields':'created_at',
    'expansions': 'author_id',
    'user.fields':'username',
    'max_results':100
}

response = requests.get(url, headers=headers, params=params)

if response.status_code != 200:
    raise Exception(response.status_code, response.text)

df = pd.json_normalize(response.json()['data'])
df

Unnamed: 0,text,author_id,id,created_at
0,MIT Graduate-Led Winter Enrichment Opportunity...,467513287,1444857221045760006,2021-10-04T02:49:40.000Z
1,The Automated Knowledge Base Construction - AK...,1390799566253920257,1444854299276857344,2021-10-04T02:38:03.000Z
2,"Microbes, Natural Intelligence and Artificial ...",2477954400,1444842320352870408,2021-10-04T01:50:27.000Z
3,RT @andi_staub https://t.co/G78omWoDyL 6 Uses ...,3358500851,1444842214509645829,2021-10-04T01:50:02.000Z
4,Hire us to do your\nExams\nNursing\nEssays\nHi...,2477954400,1444839654801375236,2021-10-04T01:39:52.000Z
...,...,...,...,...
95,BusinessBea: 10 Steps That You Never Expect On...,1399707627597279236,1444599136284823557,2021-10-03T09:44:08.000Z
96,Buy Google 5 Star Reviews-100% verified\n\nhtt...,1370393914692280322,1444590574435139586,2021-10-03T09:10:06.000Z
97,"With #NLP, radiologists have a database of met...",2567311778,1444588030115844105,2021-10-03T09:00:00.000Z
98,#MachineLearning #AI #DeepLearning #AIEthics #...,554835401,1444577001080188932,2021-10-03T08:16:10.000Z


In [9]:
df_users = pd.json_normalize(response.json()['includes']['users'])
df_users.rename(columns={'id': 'author_id'}, inplace=True)
df_users

Unnamed: 0,author_id,name,username
0,467513287,"Iain Brown, PhD",IainLJBrown
1,1390799566253920257,AINews.com,AINewsDotCom
2,2477954400,raja007,raja00710
3,3358500851,Paris Fintech Forum,ParisFinForum
4,298704683,Andreas Staub,andi_staub
5,918112383628963841,David Sobo,DS_Analytics
6,705539763349164032,ipfconline,ipfconline1
7,840922607654445058,DataWorkout,dataworkout
8,22146921,fly51fly,fly51fly
9,1017816919980769280,Ravi Dugh,ravidugh


In [10]:
# Unimos los dos dataframes por medio de la columna "author_id"
pd.merge(df, df_users, on="author_id")

Unnamed: 0,text,author_id,id,created_at,name,username
0,MIT Graduate-Led Winter Enrichment Opportunity...,467513287,1444857221045760006,2021-10-04T02:49:40.000Z,"Iain Brown, PhD",IainLJBrown
1,"Microbes, Natural Intelligence and Artificial ...",467513287,1444796732462833675,2021-10-03T22:49:18.000Z,"Iain Brown, PhD",IainLJBrown
2,The Automated Knowledge Base Construction - AK...,1390799566253920257,1444854299276857344,2021-10-04T02:38:03.000Z,AINews.com,AINewsDotCom
3,"Microbes, Natural Intelligence and Artificial ...",2477954400,1444842320352870408,2021-10-04T01:50:27.000Z,raja007,raja00710
4,Hire us to do your\nExams\nNursing\nEssays\nHi...,2477954400,1444839654801375236,2021-10-04T01:39:52.000Z,raja007,raja00710
...,...,...,...,...,...,...
95,The Impact of Machine Learning in the FinTech ...,49427323,1444610707098640394,2021-10-03T10:30:06.000Z,"Richard Eudes, PhD",RichardEudes
96,Will AI become an existential threat for us?\n...,181221694,1444603298489716736,2021-10-03T10:00:40.000Z,Vivek Dahiya,MeharVik
97,🚘What are the5⃣levels of #Automation in #SelfD...,1430804322351304708,1444602276983750657,2021-10-03T09:56:36.000Z,Digital Edwin #DigitalEdwin,DigitalEdwyn
98,Buy Google 5 Star Reviews-100% verified\n\nhtt...,1370393914692280322,1444590574435139586,2021-10-03T09:10:06.000Z,usaxoom,usaxoom


- Una lista de los textos y enlaces de los tweets que contengan los hashtags #InteligenciaArtificial o #IA en español

In [11]:
params = {
    'query': '(#InteligenciaArtificial OR #IA) lang:es -is:retweet',
    'tweet.fields':'created_at,entities',
    'expansions': 'author_id',
    'user.fields':'username',
    'max_results':100
}

response = requests.get(url, headers=headers, params=params)

# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)


df = pd.json_normalize(response.json()['data'])
df.drop(columns=['entities.annotations','entities.hashtags','entities.mentions'],inplace=True)

# Obtenemos indices de muestras con alive-at-1 faltante
index_to_drop = df[pd.isna(df['entities.urls'])].index

# Eliminar esos índices de el dataframe
df = df.drop(index_to_drop, axis=0)
df

Unnamed: 0,text,author_id,id,created_at,entities.urls
1,#InteligenciaArtificial\n¿Cuáles son las tende...,58899916,1444850775046475780,2021-10-04T02:24:03.000Z,"[{'start': 257, 'end': 280, 'url': 'https://t...."
3,Conversacion con #InteligenciaArtificial male ...,2493556459,1444839592843169795,2021-10-04T01:39:37.000Z,"[{'start': 94, 'end': 117, 'url': 'https://t.c..."
4,La Cámara de Diputados de Brasil aprobó esta s...,1360630913948860419,1444836822618714117,2021-10-04T01:28:36.000Z,"[{'start': 117, 'end': 140, 'url': 'https://t...."
5,Estos son los nuevos #auriculares #gamer #ROGD...,3039681,1444828834352730114,2021-10-04T00:56:52.000Z,"[{'start': 256, 'end': 279, 'url': 'https://t...."
6,"Relevante—episodio 33 🦿🦾🤖🦾, by @RelevanteOk \n...",1443774224896774150,1444827860863176708,2021-10-04T00:53:00.000Z,"[{'start': 251, 'end': 274, 'url': 'https://t...."
...,...,...,...,...,...
95,📚En las 𝗟𝗲𝗰𝘁𝘂𝗿𝗮𝘀 𝗥𝗲𝗰𝗼𝗺𝗲𝗻𝗱𝗮𝗱𝗮𝘀 encontrarás las ...,18079731,1444572931741339651,2021-10-03T08:00:00.000Z,"[{'start': 222, 'end': 245, 'url': 'https://t...."
96,En #Valladolid continúa actuando una de las tr...,555852448,1444569839708823554,2021-10-03T07:47:43.000Z,"[{'start': 257, 'end': 280, 'url': 'https://t...."
97,#LoMásLeídoSeptiembre #ENTREVISTA | ¿Cómo ha e...,19563358,1444558083045294085,2021-10-03T07:01:00.000Z,"[{'start': 139, 'end': 162, 'url': 'https://t...."
98,Escucha nuestro #podcast por #spotify sobre Pr...,1361055709392207878,1444552311301345280,2021-10-03T06:38:04.000Z,"[{'start': 92, 'end': 115, 'url': 'https://t.c..."


In [12]:
# Dejamos la url del tweet que es lo que nos importa
values = []
for v in df['entities.urls'][:]:
    values.append(v[0]['url'])
df['entities.urls'] = values
df

Unnamed: 0,text,author_id,id,created_at,entities.urls
1,#InteligenciaArtificial\n¿Cuáles son las tende...,58899916,1444850775046475780,2021-10-04T02:24:03.000Z,https://t.co/WPtax67Rgs
3,Conversacion con #InteligenciaArtificial male ...,2493556459,1444839592843169795,2021-10-04T01:39:37.000Z,https://t.co/7D0jWhsit1
4,La Cámara de Diputados de Brasil aprobó esta s...,1360630913948860419,1444836822618714117,2021-10-04T01:28:36.000Z,https://t.co/5QXY1RPCB2
5,Estos son los nuevos #auriculares #gamer #ROGD...,3039681,1444828834352730114,2021-10-04T00:56:52.000Z,https://t.co/RzF17NndSo
6,"Relevante—episodio 33 🦿🦾🤖🦾, by @RelevanteOk \n...",1443774224896774150,1444827860863176708,2021-10-04T00:53:00.000Z,https://t.co/lGpYZWjT6p
...,...,...,...,...,...
95,📚En las 𝗟𝗲𝗰𝘁𝘂𝗿𝗮𝘀 𝗥𝗲𝗰𝗼𝗺𝗲𝗻𝗱𝗮𝗱𝗮𝘀 encontrarás las ...,18079731,1444572931741339651,2021-10-03T08:00:00.000Z,https://t.co/tl7QAVaviB
96,En #Valladolid continúa actuando una de las tr...,555852448,1444569839708823554,2021-10-03T07:47:43.000Z,https://t.co/LHEC9j4YVO
97,#LoMásLeídoSeptiembre #ENTREVISTA | ¿Cómo ha e...,19563358,1444558083045294085,2021-10-03T07:01:00.000Z,https://t.co/cJ4OdH2pqG
98,Escucha nuestro #podcast por #spotify sobre Pr...,1361055709392207878,1444552311301345280,2021-10-03T06:38:04.000Z,https://t.co/AeRMTE5OJ7


### Descargar a CSV

Realizamos la obtención de los mismos datos de recién, que están paginados, pero los guardamos a todos en un archivo .csv.

In [19]:
def get_data(url,params):
    results = []

    while True:
        response = requests.get(url, headers=headers, params=params)
        # Generar excepción si la respuesta no es exitosa
        if response.status_code != 200:
            raise Exception(response.status_code, response.text)
        data = response.json()['data']
        meta_data = dict(response.json())['meta']
        results.append(pd.json_normalize(data))
        if 'next_token' not in meta_data:
            break
        else:
            token = meta_data['next_token']
            print("Token paginacion actual: ", token)
            params = {
                'query': '(#InteligenciaArtificial OR #IA) lang:es -is:retweet',
                'tweet.fields':'created_at,entities',
                'expansions': 'author_id',
                'user.fields':'username',
                'next_token':token,
                'max_results':100
            }
    return pd.concat(results)


response = requests.get(url, headers=headers, params=params)

# Generar excepción si la respuesta no es exitosa
if response.status_code != 200:
    raise Exception(response.status_code, response.text)
    
print("Request metadata", dict(response.json())['meta'])

df = get_data(url, params)

df.drop(columns=['entities.annotations','entities.hashtags','entities.mentions', 'entities.urls'],inplace=True)

df

Request metadata {'newest_id': '1444858979172163586', 'oldest_id': '1444543486947123202', 'result_count': 100, 'next_token': 'b26v89c19zqg8o3fpds7pazr984x9fb9ekkz45h5thh8d'}
Token actual:  b26v89c19zqg8o3fpds7pazr984x9fb9ekkz45h5thh8d
Token actual:  b26v89c19zqg8o3fpds7p4lkxsji2b0tbkx9oxm1b71bx
Token actual:  b26v89c19zqg8o3fpds7ady7r2cc0xlfthrwe9xjwvqt9
Token actual:  b26v89c19zqg8o3fpds7advioqdm441xdst8yx81y0ltp
Token actual:  b26v89c19zqg8o3fpds7abre7c4dgo7pjih3pxyas156l
Token actual:  b26v89c19zqg8o3fpds7a7i8kjho99xkkkn4zf8awww3h
Token actual:  b26v89c19zqg8o3fpds7a5f0nyreo0ojiltbbjmeyggal
Token actual:  b26v89c19zqg8o3fpds7a5dimrd68ggy9ndqzr6tgoh31
Token actual:  b26v89c19zqg8o3fpds7a3aa9i03141v5guhtw9tfgcjh
Token actual:  b26v89c19zqg8o3fpds79z14xcl30zbalguaoo800c8zh
Token actual:  b26v89c19zqg8o3fpds79yzc3r34292m01lg1fl2m5ocd
Token actual:  b26v89c19zqg8o3fpds79wvia5k9vg5p27fn0akmdo58d
Token actual:  b26v89c19zqg8o3fpds79ur36cptsnw8znh9q9s127gcd
Token actual:  b26v89c19zqg8o3fpd

Unnamed: 0,id,text,author_id,created_at
0,1444858979172163586,Para celebrar el #bicentenario la @AlcaldiaCuc...,171197878,2021-10-04T02:56:39.000Z
1,1444850775046475780,#InteligenciaArtificial\n¿Cuáles son las tende...,58899916,2021-10-04T02:24:03.000Z
2,1444849834951856129,Las sociedades algorítmicas no conquistarán es...,83983853,2021-10-04T02:20:19.000Z
3,1444839592843169795,Conversacion con #InteligenciaArtificial male ...,2493556459,2021-10-04T01:39:37.000Z
4,1444836822618714117,La Cámara de Diputados de Brasil aprobó esta s...,1360630913948860419,2021-10-04T01:28:36.000Z
...,...,...,...,...
75,1442348660801560577,O último The José Antonio Aparecido Tavares-su...,1251328359767986176,2021-09-27T04:41:32.000Z
76,1442346539880386565,No abras este mail. (Tranquilo no es un virus)...,2370333380,2021-09-27T04:33:07.000Z
77,1442335013622722563,Validación de #préstamos basados ​​en activos ...,919011836,2021-09-27T03:47:19.000Z
78,1442334212263186433,"¿Quiénes para compañeros(as) de estudio?\n\n""C...",125850678,2021-09-27T03:44:08.000Z


In [13]:
df.to_csv('tweets_ej.csv')  