# Carga de datos,almacenamiento y archivos de formato

La entrada y la salida generalmente se dividen en unas pocas categorías principales: lectura de archivos de texto y otros formatos de disco más eficientes, carga de datos desde bases de datos e interacción con fuentes de red como APIs web.

# Leyendo y escribiendo datos CSV

Pandas cuenta con una serie de funciones para leer datos tabulares como un objeto DataFrame. La tabla inferior resume algunas de ellas, aunque read_csv y read_table son probablemente las que más utilizará.

![lecturaARC.png](attachment:lecturaARC.png)

In [1]:
import pandas as pd
#Manejando CSV "valores separados por comas"
mydataset = pd.read_csv('Store_CA.csv')
mydataset

Unnamed: 0,ProductVariety,MarketingSpend,CustomerFootfall,StoreSize,EmployeeEfficiency,StoreAge,CompetitorDistance,PromotionsCount,EconomicIndicator,StoreLocation,StoreCategory,MonthlySalesRevenue
0,581,29,1723,186,84.9,1,12,6,108.3,Los Angeles,Electronics,284.90
1,382,31,1218,427,75.8,18,11,6,97.8,Los Angeles,Electronics,308.21
2,449,35,2654,142,92.8,14,11,6,101.1,Los Angeles,Grocery,292.11
3,666,9,2591,159,66.3,11,11,4,115.1,Sacramento,Clothing,279.61
4,657,35,2151,275,89.1,28,12,7,93.4,Palo Alto,Electronics,359.71
...,...,...,...,...,...,...,...,...,...,...,...,...
1645,295,15,2681,235,58.5,15,10,5,88.7,Sacramento,Clothing,273.55
1646,761,8,1398,456,78.5,26,14,4,95.1,San Francisco,Clothing,432.82
1647,405,21,1490,465,76.7,18,12,5,73.0,Los Angeles,Clothing,303.52
1648,359,41,2042,350,67.6,2,6,7,105.0,Palo Alto,Clothing,241.39


In [2]:
ejemplo2 = pd.read_csv('ex2.txt',sep=',') #anadimos la extension archivo
ejemplo2

Unnamed: 0,1,2,3,4,hello
0,5,6,7,8,world
1,9,10,11,12,foo


In [None]:
#Cuando las columnas no tienen etiquetas??


In [3]:
ejemplo2 = pd.read_csv('ex2.txt',sep=',', header = None ) #anadimos la extension archivo
ejemplo2

Unnamed: 0,0,1,2,3,4
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [4]:
#Forma alternativa 
pd.read_csv('ex2.txt',sep=',', names = ['a','b','c','d'])

Unnamed: 0,a,b,c,d
1,2,3,4,hello
5,6,7,8,world
9,10,11,12,foo


In [5]:
import numpy as np

#Exportando datasets
frame_numpy = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
frame_numpy

Unnamed: 0,b,d,e
Utah,0.646098,-0.265424,-1.205161
Ohio,0.588058,1.091293,-0.612938
Texas,-1.336902,-0.80356,0.065854
Oregon,0.282102,-0.237645,-1.863364


In [6]:
frame_numpy.to_csv('USA_states.csv')

# JSON Data

JSON (abreviatura de JavaScript Object Notation) se ha convertido en uno de los formatos estándar para enviar datos mediante solicitudes HTTP entre navegadores web y otras aplicaciones. Es un formato de datos mucho más libre que un formato de texto tabular como CSV.

In [7]:
obj = """
 {"name": "Wes",
 "places_lived": ["United States", "Spain", "Germany"],
 "pet": null,
 "siblings": [{"name": "Scott", "age": 30, "pets": ["Zeus", "Zuko"]},
              {"name": "Katie", "age": 38,
               "pets": ["Sixes", "Stache", "Cisco"]}]
 }
 """

In [8]:
obj

'\n {"name": "Wes",\n "places_lived": ["United States", "Spain", "Germany"],\n "pet": null,\n "siblings": [{"name": "Scott", "age": 30, "pets": ["Zeus", "Zuko"]},\n              {"name": "Katie", "age": 38,\n               "pets": ["Sixes", "Stache", "Cisco"]}]\n }\n '

In [9]:
import json

result = json.loads(obj)

In [10]:
result

{'name': 'Wes',
 'places_lived': ['United States', 'Spain', 'Germany'],
 'pet': None,
 'siblings': [{'name': 'Scott', 'age': 30, 'pets': ['Zeus', 'Zuko']},
  {'name': 'Katie', 'age': 38, 'pets': ['Sixes', 'Stache', 'Cisco']}]}

In [11]:
dataSET = pd.read_json('ejJSON.json')
dataSET

Unnamed: 0,squadName,homeTown,formed,secretBase,active,members
0,Super hero squad,Metro City,2016,Super tower,True,"{'name': 'Molecule Man', 'age': 29, 'secretIde..."
1,Super hero squad,Metro City,2016,Super tower,True,"{'name': 'Madame Uppercut', 'age': 39, 'secret..."
2,Super hero squad,Metro City,2016,Super tower,True,"{'name': 'Eternal Flame', 'age': 1000000, 'sec..."


In [12]:
import numpy as np

#Exportando datasets
frame_numpy = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
frame_numpy

Unnamed: 0,b,d,e
Utah,-0.828894,1.397706,-0.589572
Ohio,-0.295708,-0.796115,0.400422
Texas,0.46017,-1.711352,-0.112115
Oregon,0.293147,-0.156717,-0.304685


In [13]:
print(frame_numpy.to_json())

{"b":{"Utah":-0.828894001,"Ohio":-0.2957082104,"Texas":0.4601703293,"Oregon":0.2931467962},"d":{"Utah":1.397706158,"Ohio":-0.7961151624,"Texas":-1.7113518631,"Oregon":-0.1567166442},"e":{"Utah":-0.5895724125,"Ohio":0.4004224261,"Texas":-0.1121150514,"Oregon":-0.3046854445}}


In [14]:
frame_numpy.to_json() # no genera archivo JSON perce

'{"b":{"Utah":-0.828894001,"Ohio":-0.2957082104,"Texas":0.4601703293,"Oregon":0.2931467962},"d":{"Utah":1.397706158,"Ohio":-0.7961151624,"Texas":-1.7113518631,"Oregon":-0.1567166442},"e":{"Utah":-0.5895724125,"Ohio":0.4004224261,"Texas":-0.1121150514,"Oregon":-0.3046854445}}'

# Leyendo archivos Excel

In [15]:
#Pandas soporta la lectura de datos Excel 2003. Depende la version quizas sea necesario instalar libreria openpyxl
dataEXCEL = pd.read_excel('censoAGR_SCZ.xls')

In [16]:
pip install xlrd

Note: you may need to restart the kernel to use updated packages.Defaulting to user installation because normal site-packages is not writeable



In [17]:
dataEXCEL = pd.read_excel('Bolivia - Produccion Año Agricola por Departamento, 1984 - 2023.xlsx')

In [18]:
dataEXCEL

Unnamed: 0.1,Unnamed: 0,DESCRIPCIÓN,1983-1984,1984-1985,1985-1986,1986-1987,1987-1988,1988-1989,1989-1990,1990-1991,...,2013-2014,2014-2015,2015-2016,2016-2017,2017-2018,2018-2019,2019-2020(p),2020-2021(p),2021-2022(p),2022-2023(p)
0,,,,,,,,,,,...,,,,,,,,,,
1,,CEREALES,625646.0,738127.0,566038.0,552229.0,582737.0,586460.0,633351.0,808957.0,...,2449393.0,2934920.0,2660494.0,2279134.0,3267425.0,2.915388e+06,2.914459e+06,3.741163e+06,2.615641e+06,3.226597e+06
2,,Maíz en grano (1),343893.0,368864.0,288771.0,264694.0,268405.0,262023.0,265481.0,359013.0,...,1006622.0,1056557.0,984628.0,954909.0,1260926.0,9.875032e+05,9.548330e+05,1.250330e+06,9.856208e+05,1.257459e+06
3,,Arroz con cáscara,157737.0,199277.0,130031.0,165892.0,172445.0,170070.0,217349.0,216053.0,...,484057.0,527341.0,406954.0,478578.0,541157.0,6.000443e+05,4.874274e+05,5.501815e+05,6.086807e+05,6.985436e+05
4,,Sorgo en grano (1),25796.0,60173.0,47466.0,18677.0,37561.0,43100.0,41162.0,79800.0,...,656494.0,858101.0,802203.0,556868.0,1023314.0,9.490391e+05,1.018885e+06,1.481332e+06,5.953367e+05,8.864433e+05
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89,,Censo Nacional Agropecuario-2013,,,,,,,,,...,,,,,,,,,,
90,,Encuesta Agropecuaria - 2015,,,,,,,,,...,,,,,,,,,,
91,,(1) Incluye la campaña de invierno...,,,,,,,,,...,,,,,,,,,,
92,,Nota: La información correspon...,,,,,,,,,...,,,,,,,,,,


In [19]:
#Para acceder por ano
data_CHU = pd.read_excel('Bolivia - Produccion Año Agricola por Departamento, 1984 - 2023.xlsx', sheet_name = 'Chuquisaca')

In [20]:
data_CHU

Unnamed: 0.1,Unnamed: 0,DESCRIPCIÓN,1983-1984,1984-1985,1985-1986,1986-1987,1987-1988,1988-1989,1989-1990,1990-1991,...,2013-2014,2014-2015,2015-2016,2016-2017,2017-2018,2018-2019,2019-2020(p),2020-2021(p),2021-2022(p),2022-2023(p)
0,,,,,,,,,,,...,,,,,,,,,,
1,,CEREALES,53014.0,52564.0,51103.0,53643.0,51559.0,52383.0,53650.0,51151.0,...,101260.0,112985.0,119154.0,127291.0,136440.0,123710.0,126262.215597,116534.936300,85732.472342,163849.362481
2,,Arroz con cáscara,201.0,206.0,228.0,225.0,237.0,271.0,272.0,257.0,...,396.0,456.0,465.0,391.0,392.0,388.0,410.259564,388.006217,393.976663,398.596447
3,,Avena,178.0,187.0,206.0,230.0,257.0,318.0,257.0,355.0,...,680.0,674.0,724.0,718.0,679.0,664.0,687.876607,664.078555,682.581888,700.561630
4,,Cebada en grano,3654.0,3745.0,3687.0,4430.0,4312.0,3961.0,4011.0,4403.0,...,7224.0,6626.0,6828.0,6735.0,6854.0,6783.0,6774.507598,7039.595025,6835.624624,6779.580838
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
82,,Censo Nacional Agropecuario-2013,,,,,,,,,...,,,,,,,,,,
83,,Encuesta Agropecuaria - 2015,,,,,,,,,...,,,,,,,,,,
84,,(1) Incluye la campaña de invierno...,,,,,,,,,...,,,,,,,,,,
85,,Nota: La información correspon...,,,,,,,,,...,,,,,,,,,,


In [21]:
#Exportar un dataframe a excel
import numpy as np

#Exportando datasets
frame_numpy = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
frame_numpy

Unnamed: 0,b,d,e
Utah,-0.584721,1.1571,0.772417
Ohio,-0.366041,1.242216,-0.254708
Texas,-0.54511,0.653498,-0.283276
Oregon,-0.956387,-0.681825,-0.191984


In [23]:
frame_numpy.to_excel('USAstates.xlsx') #estara bien?

# Interactuando con APIs

**Breve definicion**

Las API son mecanismos que permiten que dos componentes de software se comuniquen entre sí mediante un conjunto de definiciones y protocolos 

In [24]:
import requests
url = "https://api.nasa.gov/planetary/apod?api_key=DEMO_KEY"
resp = requests.get(url)
resp

<Response [200]>

**Tipos de respuestas**
Son códigos HTTP que se pueden devolver cuando intentamos hacer conexiones con las API.Según la plataforma MDN Web Docs, los valores posibles son:

100 - 199 → respuestas de información.

200 - 299 → respuestas de éxito.

300 - 399 → redireccionamiento.

400 - 499 → errores del cliente.

500 - 599 → errores del servidor.

In [25]:
#Retorna un diccionario JSON, configurado como un objeto nativo de Python
data = resp.json()

In [26]:
data

{'copyright': 'Purple Mountain (Tsuchinshan)',
 'date': '2024-12-06',
 'explanation': "Colorful and bright, this streaking fireball meteor was captured in a single exposure taken at Purple Mountain (Tsuchinshan) Observatory’s Xuyi Station in 2020, during planet Earth's annual Perseid meteor shower. The dome in the foreground houses the China Near Earth Object Survey Telescope (CNEOST), the largest multi-purpose Schmidt telescope in China. Located in Xuyi County, Jiangsu Province, the station began its operation as an extension of China's Purple Mountain Observatory in 2006. Darling of planet Earth's night skies in 2024, the bright comet designated Tsuchinshan-ATLAS (C/2023 A3) was discovered in images taken there on 2023 January 9. The discovery is jointly credited to NASA's ATLAS robotic survey telescope at Sutherland Observatory, South Africa. Other comet discoveries associated with the historic Purple Mountain Observatory and bearing the observatory's transliterated Mandarin name in

In [27]:
problemasGIT = pd.DataFrame(data, index = [1])
problemasGIT

Unnamed: 0,copyright,date,explanation,hdurl,media_type,service_version,title,url
1,Purple Mountain (Tsuchinshan),2024-12-06,"Colorful and bright, this streaking fireball m...",https://apod.nasa.gov/apod/image/2412/PurpleMo...,image,v1,Xuyi Station and the Fireball,https://apod.nasa.gov/apod/image/2412/PurpleMo...
