# Importing Locally Data 
Importar datos de manera local como: 

    1.  Archivo CSV o Excel o Matlab (.csv, .xls, .mat)
    2. Base de Datos (SQLite)

## Importar Datos con el módulo pandas

In [None]:
import pandas as pd

# Leer el archivo CSV
df = pd.read_csv('Financial Sample.csv')

# Mostrar las primeras filas del DataFrame
print(df.head())


## Importar Datos con el módulo csv

In [None]:
import csv

# Abrir el archivo CSV
with open('Financial Sample.csv', newline='') as csvfile:
    # Leer el archivo CSV
    reader = csv.reader(csvfile)
    # Iterar sobre las filas del CSV
    for row in reader:
        print(', '.join(row))  # Imprimir cada fila



# Importar datos de la red
Es bueno aprender a importar datos de manera local, pero no siempre será suficiente porque no obtendré los datos que necesito. Para esto tendré que importarlo desde la Web. 

Aquí aprenderemos a: 
- Import and locally save datasets from the web
- Load datasets into Panda DataFrames
- Make HTTP request (GET Request)
- Scrape web data such as HTML 
- Parse HTML into usefull data (BeautifilSoup)
- Use the urllib and request packages 

## The "urllib" package
- Provides interface for fetching data across the web 
- *urlopen()* accepts URLs instead of file names

### White Wine

In [None]:
from urllib.request import urlretrieve
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv'
urlretrieve(url, 'winequality-white.csv')

In [None]:


import pandas as pd
df = pd.read_csv('winequality-white.csv', sep=';')
print(df.head())

### Red Wine

In [None]:
# Import package
from urllib.request import urlretrieve

# Import pandas
import pandas as pd

# Assign url of file: url
url = 'https://assets.datacamp.com/production/course_1606/datasets/winequality-red.csv'

# Save file locally
urlretrieve(url, 'winequality-red.csv')

# Read file into a DataFrame and print its head
df = pd.read_csv('winequality-red.csv', sep=';')
print(df.head())

## Opening and reading flat files from the web 
You have just imported a file from the web, saved it locally and loaded it into a DataFrame. If you just wanted to load a file from the web into a DataFrame without first saving it locally, you can do that easily using *pandas*. In particular, you can use the function *pd.read_csv()* with the URL as the first argument and the separator *sep* as the second argument.



### Metodo *iloc*
El método iloc en pandas se utiliza para la indexación basada en la posición entera de las filas y columnas de un DataFrame.

En la expresión df.iloc[:, 0].hist(), se está aplicando iloc para seleccionar todas las filas de la primera columna del DataFrame df, y luego se está llamando al método hist() para trazar un histograma de los valores en esa columna.

Aquí está desglosado cómo funciona:

- *df.iloc[:, 0]*: Esto selecciona todas las filas (:) de la primera columna (0) del DataFrame df. El primer argumento (:) selecciona todas las filas, y el segundo argumento (0) selecciona la primera columna.
- *.hist()*: Una vez que se ha seleccionado esa columna, se llama al método hist() para trazar un histograma de los valores en esa columna.

In [None]:
# Import packages
import matplotlib.pyplot as plt
import pandas as pd

# Assign url of file: url
url = 'https://assets.datacamp.com/production/course_1606/datasets/winequality-red.csv'

# Read file into a DataFrame: df
df = pd.read_csv(url, sep = ';')

# Print the head of the DataFrame
print(df.head())

# Plot first column of df
df.iloc[:, 0].hist()
plt.xlabel('fixed acidity (g(tartaric acid)/dm$^3$)')
plt.ylabel('count')
plt.show()


## Importing non-flat files from the web
Congrats! You've just loaded a flat file from the web into a DataFrame without first saving it locally using the *pandas* function *pd.read_csv()*. This function is super cool because it has close relatives that allow you to load all types of files, not only flat ones. In this interactive exercise, you'll use *pd.read_excel()* to import an Excel spreadsheet.

- En este caso estamos leyendo un archivo xls(excel) que contiene 2 hojas, para poder cargar todas las hojas especificamos 'None' en el 'sheet_name'
- con el comando *keys()* imprimiremos los nombres de las hojas *['1700', '1900']*
- Al final imprimimos los 5 primeros datos de la hoja llamada '1700'

In [None]:
# Import package
import pandas as pd

# Assign url of file: url
url = 'https://assets.datacamp.com/course/importing_data_into_r/latitude.xls'

# Read in all sheets of Excel file: xls
xls = pd.read_excel(url, sheet_name=None)

# Print the sheetnames to the shell
print(xls.keys())

# Print the head of the first sheet (using its name, NOT its index)
print(xls['1700'].head())

## Performing HTTP request in Python using urllib
Now that you know the basics behind HTTP GET requests, it's time to perform some of your own. In this interactive exercise, you will ping our very own DataCamp servers to perform a GET request to extract information from the first coding exercise of this course, "https://campus.datacamp.com/courses/1606/4135?ex=2".

In the next exercise, you'll extract the HTML itself. Right now, however, you are going to package and send the request and then catch the response.

In [None]:
# Import packages
from urllib.request import urlopen, Request

# Specify the url
url = "https://campus.datacamp.com/courses/1606/4135?ex=2"

# This packages the request: request
request = Request(url)

# Sends the request and catches the response: response
response = urlopen(request)

# Print the datatype of response
print(type(response))

# Be polite and close the response!
response.close()


You have just packaged and sent a GET request to "https://campus.datacamp.com/courses/1606/4135?ex=2" and then caught the response. You saw that such a response is a http.client.HTTPResponse object. The question remains: what can you do with this response?

Well, as it came from an HTML page, you could read it to extract the HTML and, in fact, such a http.client.HTTPResponse object has an associated read() method. In this exercise, you'll build on your previous great work to extract the response and print the HTML.

In [None]:
# Import packages
from urllib.request import urlopen, Request

# Specify the url
url = "https://campus.datacamp.com/courses/1606/4135?ex=2"

# This packages the request
request = Request(url)

# Sends the request and catches the response: response
response = urlopen(request)

# Extract the response: html
html = response.read()

# Print the html
print(html)

# Be polite and close the response!
response.close()

Now that you've got your head and hands around making HTTP requests using the urllib package, you're going to figure out how to do the same using the higher-level requests library. You'll once again be pinging DataCamp servers for their "http://www.datacamp.com/teach/documentation" page.

Note that unlike in the previous exercises using urllib, you don't have to close the connection when using requests!

In [None]:
# Import package
import requests

# Specify the url: url
url = "http://www.datacamp.com/teach/documentation"

# Packages the request, send the request and catch the response: r
r = requests.get(url)

# Extract the response: text
text = r.text

# Print the html
print(text)

### Beautiful Soup ! 
Beautiful Soup es una biblioteca de Python para extraer datos de documentos HTML y XML (incluyendo los que tienen un marcado incorrecto). Esta biblioteca crea un árbol con todos los elementos del documento y puede ser utilizado para extraer información. Por lo tanto, esta biblioteca es útil para realizar web scraping — extraer información de sitios web.2​

Beautiful Soup no es un analizador de documentos (parser), sino que crea las estructuras de datos necesarias para manejar de manera sencilla los datos extraídos por los analizadores, los cuales no forman parte del paquete, sobre los que trabaja.

### Parsing HTML with BeautifulSoup
In this interactive exercise, you'll learn how to use the BeautifulSoup package to parse, prettify and extract information from HTML. You'll scrape the data from the webpage of Guido van Rossum, Python's very own Benevolent Dictator for Life. In the following exercises, you'll prettify the HTML and then extract the text and the hyperlinks.

In [None]:
# Import packages
import requests
from bs4 import BeautifulSoup

# Specify url: url
url = 'https://www.python.org/~guido/'

# Package the request, send the request and catch the response: r
r = requests.get(url)

# Extracts the response as html: html_doc
html_doc = r.text

# Create a BeautifulSoup object from the HTML: soup
soup = BeautifulSoup(html_doc)

# Prettify the BeautifulSoup object: pretty_soup
pretty_soup = soup.prettify()

# Print the response
print(pretty_soup)

###  Turning a webpage into data using BeautifulSoup: getting the text
As promised, in the following exercises, you'll learn the basics of extracting information from HTML soup. In this exercise, you'll figure out how to extract the text from the BDFL's webpage, along with printing the webpage's title.

In [None]:
# Import packages
import requests
from bs4 import BeautifulSoup

# Specify url: url
url = 'https://www.python.org/~guido/'

# Package the request, send the request and catch the response: r
r = requests.get(url)

# Extract the response as html: html_doc
html_doc = r.text

# Create a BeautifulSoup object from the HTML: soup
soup = BeautifulSoup(html_doc)

# Get the title of Guido's webpage: guido_title
guido_title = soup.title

# Print the title of Guido's webpage to the shell
print(guido_title)

# Get Guido's text: guido_text
guido_text = soup.get_text()

# Print Guido's text to the shell
print(guido_text)

### Turning a webpage into data using BeautifulSoup: getting the hyperlinks
In this exercise, you'll figure out how to extract the URLs of the hyperlinks from the BDFL's webpage. In the process, you'll become close friends with the soup method find_all().

Use the method *find_all()* to find all hyperlinks in soup, remembering that hyperlinks are defined by the HTML tag 'a' but passed to *find_all()* without angle brackets; store the result in the variable a_tags.
The variable *a_tags* is a results set: your job now is to enumerate over it, using a for loop and to print the actual *URLs* of the hyperlinks; to do this, for every element link in *a_tags*, you want to *print() link.get('href')*.

In [None]:
# Import packages
import requests
from bs4 import BeautifulSoup

# Specify url
url = 'https://www.python.org/~guido/'

# Package the request, send the request and catch the response: r
r = requests.get(url)

# Extracts the response as html: html_doc
html_doc = r.text

# create a BeautifulSoup object from the HTML: soup
soup = BeautifulSoup(html_doc)

# Print the title of Guido's webpage
print(soup.title)

# Find all 'a' tags (which define hyperlinks): a_tags
a_tags = soup.find_all('a')

# Print the URLs to the shell
for link in a_tags:
    print(link.get('href'))

# JSON

JSON (acrónimo de JavaScript Object Notation, 'notación de objeto de JavaScript') es un formato de texto sencillo para el intercambio de datos. Se trata de un subconjunto de la notación literal de objetos de JavaScript, aunque, debido a su amplia adopción como alternativa a XML, se considera un formato independiente del lenguaje.


## Loading and exploring a JSON
Now that you know what a JSON is, you'll load one into your Python environment and explore it yourself. Here, you'll load the JSON 'a_movie.json' into the variable json_data, which will be a dictionary. You'll then explore the JSON contents by printing the key-value pairs of json_data to the shell.

In [None]:
# Load JSON: json_data
with open("a_movie.json") as json_file:
    json_data = json.load(json_file)

# Print each key-value pair in json_data
for k in json_data.keys():
    print(k + ': ', json_data[k])

## API's 
API es un conjunto de protocolos y rutinas para crear aplicaciones de software e interactuar con ellas. Permite que 2 programas de software puedan comunicarse entre si. Por ejm: Si quieres transmitir datos de Twitter en python usarias la API de Twitter. Si deseas automatizar la extraccion y procesamiento de la informacion de Wikipedia, se puede hacer la API de Wikipedia. 


In [None]:
import requests
url = 'https://www.omdbapi.com/?t=hackers'
r = requests.get(url)
json_data = r.json()
print(json_data)
for key, value in json_data.items():
    print(key + ':', value)


### API requests
Now it's your turn to pull some movie data down from the Open Movie Database (OMDB) using their API. The movie you'll query the API about is The Social Network. Recall that, in the video, to query the API about the movie Hackers, Hugo's query string was 'http://www.omdbapi.com/?t=hackers' and had a single argument t=hackers.

Note: recently, OMDB has changed their API: you now also have to specify an API key. This means you'll have to add another argument to the URL: apikey=72bc447a.

Instructions:

- Import the requests package.
- Assign to the variable url the URL of interest in order to query 'http://www.omdbapi.com' for the data corresponding to the movie The Social Network. The query string should have two arguments: apikey=72bc447a and t=the+social+network. You can combine them as follows: apikey=72bc447a&t=the+social+network.
- Print the text of the response object r by using its text attribute and passing the result to the print() function.

In [1]:
# Import requests package
import requests

# Assign URL to variable: url
url = 'http://www.omdbapi.com/?apikey=72bc447a&t=the+social+network'

# Package the request, send the request and catch the response: r
r = requests.get(url)

# Print the text of the response
print(r.text)

# Decode the JSON data into a dictionary: json_data
json_data = r.json()

for k, value in json_data.items():
    print(k + ':', value)


{"Title":"The Social Network","Year":"2010","Rated":"PG-13","Released":"01 Oct 2010","Runtime":"120 min","Genre":"Biography, Drama","Director":"David Fincher","Writer":"Aaron Sorkin, Ben Mezrich","Actors":"Jesse Eisenberg, Andrew Garfield, Justin Timberlake","Plot":"As Harvard student Mark Zuckerberg creates the social networking site that would become known as Facebook, he is sued by the twins who claimed he stole their idea and by the co-founder who was later squeezed out of the business.","Language":"English, French","Country":"United States","Awards":"Won 3 Oscars. 173 wins & 187 nominations total","Poster":"https://m.media-amazon.com/images/M/MV5BOGUyZDUxZjEtMmIzMC00MzlmLTg4MGItZWJmMzBhZjE0Mjc1XkEyXkFqcGdeQXVyMTMxODk2OTU@._V1_SX300.jpg","Ratings":[{"Source":"Internet Movie Database","Value":"7.8/10"},{"Source":"Rotten Tomatoes","Value":"96%"},{"Source":"Metacritic","Value":"95/100"}],"Metascore":"95","imdbRating":"7.8","imdbVotes":"754,796","imdbID":"tt1285016","Type":"movie","DVD

### Checking out the Wikipedia API
You're doing so well and having so much fun that we're going to throw one more API at you: the Wikipedia API (documented here). You'll figure out how to find and extract information from the Wikipedia page for Pizza. What gets a bit wild here is that your query will return nested JSONs, that is, JSONs with JSONs, but Python can handle that because it will translate them into dictionaries within dictionaries.

The URL that requests the relevant query from the Wikipedia API is

Instructions: 
- Assign the relevant URL to the variable url.
- Apply the json() method to the response object r and store the resulting dictionary in the variable json_data.
- The variable pizza_extract holds the HTML of an extract from Wikipedia's Pizza page as a string; use the function print() to print this string to the shell.

In [None]:
# Import package
import requests

# Assign URL to variable: url
url = 'https://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&exintro=&titles=pizza'

# Package the request, send the request and catch the response: r
r = requests.get(url)

# Decode the JSON data into a dictionary: json_data
json_data = r.json()

# Print the Wikipedia page extract
pizza_extract = json_data['query']['pages']['24768']['extract']
print(pizza_extract)


### The Twitter API and Authentication
As a final deep dive, you're going to stream data from the Twitter API. You'll learn how to filter incoming tweets for keywords, you'll learn about the principles of API authentication and OAuth. You'll also learn the basics of the package tweepy, which many people in PythonLand use to interact with the Twitter API.

### Streaming tweets
It's time to stream some tweets! Your task is to create the Streamobject and to filter tweets according to particular keywords. tweepy has been imported for you.

Instructions: 
- Create your Stream object with the credentials given.
- Filter your Stream variable for the keywords "clinton", "trump", "sanders", and "cruz".

In [None]:
import tweepy, json

# Store credentials in relevant variables
consumer_key = "nZ6EA0FxZ293SxGNg8g8aP0HM"
consumer_secret = "fJGEodwe3KiKUnsYJC3VRndj7jevVvXbK2D5EiJ2nehafRgA6i"
access_token = "1092294848-aHN7DcRP9B4VMTQIhwqOYiB14YkW92fFO8k8EPy"
access_token_secret = "X4dHmhPfaksHcQ7SCbmZa2oYBBVSD2g8uIHXsp5CTaksx"

# Create your Stream object with credentials
stream = tweepy.Stream(consumer_key, consumer_secret, access_token, access_token_secret)

# Filter your Stream variable
stream.filter(track = ['clinton', 'trump', 'sanders', 'cruz'])

In [5]:
import tweepy
import pandas as pd

consumer_key = "KVRVfDFf0oFPtK94pPMd0ZjEu" #Your API/Consumer key 
consumer_secret = "ZkTM47w3pQjDMN98qNwHubayxRP8x9mx3Je1ucVAsfPnnpN8I9" #Your API/Consumer Secret Key
access_token = "1056483163-YIq6v2LCsQLwE0dZz9bOlhdaccnXlIV8HUNpk9Y"    #Your Access token key
access_token_secret = "89ncfIq3657XzGJ7Vgod34cRCIs6fBXV2FBFhcgrT6SV4" #Your Access token Secret key

#Pass in our twitter API authentication key
auth = tweepy.OAuth1UserHandler(
    consumer_key, consumer_secret,
    access_token, access_token_secret
)
#api = tweepy.API(auth, wait_on_rate_limit=True, api_version='1.1')


#Instantiate the tweepy API
api = tweepy.API(auth, wait_on_rate_limit=True)


search_query = "'Elon Musk''fired'-filter:retweets AND -filter:replies AND -filter:links"
no_of_tweets = 100

try:
    #The number of tweets we want to retrieved from the search
    tweets = api.search_tweets(q=search_query, lang="en", count=no_of_tweets, tweet_mode ='extended')
    
    #Pulling Some attributes from the tweet
    attributes_container = [[tweet.user.name, tweet.created_at, tweet.favorite_count, tweet.source, tweet.full_text] for tweet in tweets]

    #Creation of column list to rename the columns in the dataframe
    columns = ["User", "Date Created", "Number of Likes", "Source of Tweet", "Tweet"]
    
    #Creation of Dataframe
    tweets_df = pd.DataFrame(attributes_container, columns=columns)
except BaseException as e:
    print('Status Failed On,',str(e))

Status Failed On, 403 Forbidden
453 - You currently have access to a subset of Twitter API v2 endpoints and limited v1.1 endpoints (e.g. media post, oauth) only. If you need access to this endpoint, you may need a different access level. You can learn more here: https://developer.twitter.com/en/portal/product


In [3]:
import tweepy

In [4]:
print(tweepy.__version__)

4.10.0


In [1]:
!pip list

Package            Version
------------------ -----------
asttokens          2.4.1
beautifulsoup4     4.12.3
certifi            2024.2.2
charset-normalizer 3.3.2
colorama           0.4.6
comm               0.2.2
contourpy          1.2.1
cycler             0.12.1
debugpy            1.8.1
decorator          5.1.1
executing          2.0.1
fonttools          4.51.0
idna               3.7
ipykernel          6.29.4
ipython            8.23.0
jedi               0.19.1
jupyter_client     8.6.1
jupyter_core       5.7.2
kiwisolver         1.4.5
matplotlib         3.8.4
matplotlib-inline  0.1.6
nest-asyncio       1.6.0
numpy              1.26.4
oauthlib           3.2.2
opencv-python      4.9.0.80
packaging          24.0
pandas             2.2.2
parso              0.8.4
pillow             10.3.0
pip                24.0
platformdirs       4.2.0
prompt-toolkit     3.0.43
psutil             5.9.8
pure-eval          0.2.2
Pygments           2.17.2
pyparsing          3.1.2
python-dateutil    2.9.0.post0

In [2]:
import requests

In [15]:
# Setting up the URLs endpoints
joke_url = 'https://icanhazdadjoke.com'
iss_url = 'https://api.wheretheiss.at/v1/satellites/25544'

# This is setting up a header - metadata for your api request
my_header = {'Accept':'application/json'}

#API call
results = requests.get(joke_url, headers=my_header)

In [16]:
json_result = results.json()

In [17]:
print(json_result)

{'id': 'bprz5wXSSvc', 'joke': 'Why did the scarecrow win an award? Because he was outstanding in his field.', 'status': 200}


In [19]:
json_result['joke']

'Why did the scarecrow win an award? Because he was outstanding in his field.'

In [21]:
# Make the ISS URL CALL
iss_results = requests.get(iss_url, headers=my_header)
iss_results.json()

{'name': 'iss',
 'id': 25544,
 'latitude': 4.7297815024866,
 'longitude': 121.52644698227,
 'altitude': 414.86399404544,
 'velocity': 27592.701935199,
 'visibility': 'daylight',
 'footprint': 4481.2409190064,
 'timestamp': 1713126751,
 'daynum': 2460415.3559144,
 'solar_lat': 9.8169152542643,
 'solar_lon': 231.89044275785,
 'units': 'kilometers'}