# Data Loading, Storage, and File Formats

Input and output typically falls into a few main categories: 
- reading text files and other more efficient on-disk formats, 
- loading data from databases, 
- and interacting with network sources like web APIs.

## Reading and Writing Data in Text Format

In [2]:
import pandas as pd
import requests

url = 'https://news.google.com/covid19/map?hl=en-IN&mid=/m/03rk0&gl=IN&ceid=IN:en'

header = {
  "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
  "X-Requested-With": "XMLHttpRequest"
}

r = requests.get(url, headers=header)

df = pd.read_html(r.text)

In [10]:
del df[0]['Cases per 1 million people']
df[0].columns

Index(['Location', 'Confirmed', 'Recovered', 'Deaths'], dtype='object')

In [20]:
deaths = df[0]['Deaths'][df[0]['Deaths']!='No data'].astype(int)
Confirmed = df[0]['Confirmed'][df[0]['Confirmed']!='No data'].astype(int)

In [24]:
#deaths/Confirmed * 100

In [25]:
df = pd.read_csv('data/ex1.csv')
df

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [26]:
pd.read_table('data/ex1.csv', sep=',')

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [30]:
pd.read_csv('data/ex1.csv',header=None,skiprows=1,names=['a', 'b', 'c', 'd', 'message'],index_col='message')

Unnamed: 0_level_0,a,b,c,d
message,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
hello,1,2,3,4
world,5,6,7,8
foo,9,10,11,12


In [31]:
pd.read_csv('data/ex2.csv', skiprows=[0, 2, 3])

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [32]:
sentinels = {'message': ['foo', 'NA'], 'something': ['two']}
pd.read_csv('data/ex3.csv', na_values=sentinels)

Unnamed: 0,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,,5,6,,8,world
2,three,9,10,11.0,12,


## Reading Text Files in Pieces

When processing very large files or figuring out the right set of arguments to correctly process a large file, you may only want to read in a small piece of a file or iterate through smaller chunks of the file.

In [34]:
import numpy as np
dates = pd.date_range('1/1/2000',periods=7)
ts = pd.Series(np.arange(7), index=dates)
ts.to_csv('data/ts.csv')

## JSON Data

In [35]:
obj = """
{"name": "Wes",
 "places_lived": ["United States", "Spain", "Germany"],
 "pet": null,
 "siblings": [{"name": "Scott", "age": 30, "pets": ["Zeus", "Zuko"]},
              {"name": "Katie", "age": 38,
               "pets": ["Sixes", "Stache", "Cisco"]}]
}
"""

In [36]:
import json
result = json.loads(obj)
result

{'name': 'Wes',
 'places_lived': ['United States', 'Spain', 'Germany'],
 'pet': None,
 'siblings': [{'name': 'Scott', 'age': 30, 'pets': ['Zeus', 'Zuko']},
  {'name': 'Katie', 'age': 38, 'pets': ['Sixes', 'Stache', 'Cisco']}]}

In [39]:
siblings = pd.DataFrame(result['siblings'], columns=['name', 'age'])
siblings

Unnamed: 0,name,age
0,Scott,30
1,Katie,38


## Interacting with Web APIs

In [41]:
import requests
url = 'https://api.github.com/repos/pandas-dev/pandas/issues'
resp = requests.get(url)

data = resp.json()
#data

In [43]:
issues = pd.DataFrame(data, columns=['number', 'title','labels', 'state'])
issues

Unnamed: 0,number,title,labels,state
0,35460,BUG: AssertionError: Number of Block dimension...,"[{'id': 76811, 'node_id': 'MDU6TGFiZWw3NjgxMQ=...",open
1,35459,CI: activate github actions on 1.1.x (PR only)...,"[{'id': 48070600, 'node_id': 'MDU6TGFiZWw0ODA3...",open
2,35456,Add note about limited propagation of attrs,[],open
3,35455,REGR: DataFrame.to_numpy(dtype=str) raises Run...,"[{'id': 76811, 'node_id': 'MDU6TGFiZWw3NjgxMQ=...",open
4,35454,DOC: update Python support policy,"[{'id': 134699, 'node_id': 'MDU6TGFiZWwxMzQ2OT...",open
5,35451,WEB: Fixing whatsnew link in the home page (ve...,"[{'id': 1508144531, 'node_id': 'MDU6TGFiZWwxNT...",open
6,35450,BUG: dataframe.any() method behaves differentl...,"[{'id': 76811, 'node_id': 'MDU6TGFiZWw3NjgxMQ=...",open
7,35449,BUG: unique() casts its types' elements from `...,"[{'id': 76811, 'node_id': 'MDU6TGFiZWw3NjgxMQ=...",open
8,35447,Updated chunksize docstring for DataFrame.to_c...,[],open
9,35446,"BUG: pd.testing.assert_frame_equal(..., check_...","[{'id': 76811, 'node_id': 'MDU6TGFiZWw3NjgxMQ=...",open


<hr/>