## Loading data into Pandas

There are different ways to load data into Pandas. The library is extremely flexible allowing you to work with different popular data formats. This notebook will show you how to load from different sources which include both local and remote.

### Create a Pandas Frame from an online CSV
Note that this requires a publicly available repository in Github (for this example). A private repo will require auth.

In [None]:
import pandas as pd
csv_url = "https://raw.githubusercontent.com/paiml/wine-ratings/main/wine-ratings.csv"
# set index_col to 0 to tell pandas that the first column is the index
df = pd.read_csv(csv_url, index_col=0)
df.head(10)

Some places need authentication, so the code would look like this:

In [None]:
username = 'myUser'
password = 'mypassword!'
import requests
response = requests.get(csv_url, auth=(username, password), verify=False)

# Alternative method
from base64 import b64encode
pd.read_csv('http://localhost:8000/test.csv', storage_options={'Authorization': b'Basic %s' % b64encode(b'username:password')}

## Load a CSV from a local file

In [1]:
import pandas as pd
df = pd.read_csv("world-championship-qualifier.csv")
print(df)


    Rank                    Name     Nationality      Result    Notes Group
0    1.0           Svatoslav Ton  Czech Republic        2.14        q     A
1    1.0            Toni Huikuri        Finlandi        2.14        q     A
2    1.0          James Brierley  United Kingdom        2.14        q     A
3    1.0           Noriyasu Arai           Japan        2.14        q     A
4    5.0         Yannick Tregaro          Sweden        2.14        q     A
5    5.0       Dejan Vreljakovic              FR  Yugoslavia  2.14\tq     A
6    7.0            Alfredo Deza            Peru        2.10      NaN     A
7    8.0         Vagner Principe          Brazil        2.10      NaN     A
8    9.0  Alberto Juantorena Jr.            Cuba        2.10      NaN     A
9   10.0         Marcin Kaczocha          Poland        2.10      NaN     A
10  11.0         Andrey Krasulya         Ukraine        2.05      NaN     A
11  12.0            David Larsen   United States        2.05      NaN     A
12  13.0    

## Load JSON from a local file

In [2]:
df = pd.read_json("world-championship-qualifier.json")
df

Unnamed: 0,Rank,Name,Nationality,Result,Notes,Group
0,1.0,Svatoslav Ton,Czech Republic,2.14,q,A
1,1.0,Toni Huikuri,Finlandi,2.14,q,A
2,1.0,James Brierley,United Kingdom,2.14,q,A
3,1.0,Noriyasu Arai,Japan,2.14,q,A
4,5.0,Yannick Tregaro,Sweden,2.14,q,A
5,5.0,Dejan Vreljakovic,FR,Yugoslavia,2.14\tq,A
6,7.0,Alfredo Deza,Peru,2.10,,A
7,8.0,Vagner Principe,Brazil,2.10,,A
8,9.0,Alberto Juantorena Jr.,Cuba,2.10,,A
9,10.0,Marcin Kaczocha,Poland,2.10,,A


## You can read from many formats

The `pd` object allows you to read from various different formats including your clipboard!

- read_clipboard
- read_csv
- read_excel
- read_feather
- read_fwf
- read_gbq
- read_hdf
- read_html
- read_json
- read_orc
- read_parquet
- read_pickle
- read_sas
- read_spss
- read_sql
- read_sql_query
- read_sql_table
- read_stata
- read_table
- read_xml