## Pandas First Steps

### Install and Import

Pandas can easily be install via
    
`conda install pandas`  or  `pip install pandas`

If you are using the `Binder` repertoir we prepared for you, or you created your 
own local intallation, pandas should already be installed.

Now we usally import pandas using a shorter name just like below:

In [1]:
import pandas as pd

### Series and DataFrames

Series and DataFrames are the two primary components/ data structure of pandas.

- A `Series` is a  a one-dimensional labeled array capable of holding any data type
- A `DataFrame` is multi-dimensional table made up of a collection of `Series`

![title](img/dataFrame.PNG)

`DataFrames` and `Series` are quite similar in that many operations that you can do with one you can do with the other. (e.g. filling in null values or calculating the mean)

## How we incorporate data in JupyterLab

#### 1. Creating manually your data 

You can manually create your data set by creating directly a `DataFrame` in python. There is many different way to do this. Example:

In [2]:
data = {
    'pears' : [12, 34, 7, 6, 9],
    'apples' : [32, 2, 3, 5, 10]
}
data_df = pd.DataFrame(data)

In [3]:
data_df

Unnamed: 0,pears,apples
0,12,32
1,34,2
2,7,3
3,6,5
4,9,10


### 2. Using a CSV file as your data source

Here we are going use the a CSV file which cointain the list of TV shows and movies available on Netflix as of 2019.

In [4]:
# Let's use pandas to access some data in a CSV file

netflix_df = pd.read_csv('./data/Netflix_data/titles.csv')

In [6]:
#netflix_df

### 3. Using an Excel file as your data source

In [9]:
# Let's use pandas to access some data in a CSV file
#pd.read_excel

### 4. Using a JSON file as your data source
*JSON* or JavaScript Object Notation is a open standard file format and data interchange format.
JSON file uses human-readable text to store and transmit data ojetcs.
These data objects consist of attribute-value pairs and data types. (Just like a dictionnary)

Here is an example:
    
![title](img/json.PNG)


In [10]:
# Let's use pandas to load a JSON file in Jupyter Lab (Wine data)
data_wine = pd.read_json('./data/Wine_data/winemag-data-130k-v2.json')

In [11]:
data_wine.head()

Unnamed: 0,points,title,description,taster_name,taster_twitter_handle,price,designation,variety,region_1,region_2,province,country,winery
0,87,Nicosia 2013 Vulkà Bianco (Etna),"Aromas include tropical fruit, broom, brimston...",Kerin O’Keefe,@kerinokeefe,,Vulkà Bianco,White Blend,Etna,,Sicily & Sardinia,Italy,Nicosia
1,87,Quinta dos Avidagos 2011 Avidagos Red (Douro),"This is ripe and fruity, a wine that is smooth...",Roger Voss,@vossroger,15.0,Avidagos,Portuguese Red,,,Douro,Portugal,Quinta dos Avidagos
2,87,Rainstorm 2013 Pinot Gris (Willamette Valley),"Tart and snappy, the flavors of lime flesh and...",Paul Gregutt,@paulgwine,14.0,,Pinot Gris,Willamette Valley,Willamette Valley,Oregon,US,Rainstorm
3,87,St. Julian 2013 Reserve Late Harvest Riesling ...,"Pineapple rind, lemon pith and orange blossom ...",Alexander Peartree,,13.0,Reserve Late Harvest,Riesling,Lake Michigan Shore,,Michigan,US,St. Julian
4,87,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,"Much like the regular bottling from 2012, this...",Paul Gregutt,@paulgwine,65.0,Vintner's Reserve Wild Child Block,Pinot Noir,Willamette Valley,Willamette Valley,Oregon,US,Sweet Cheeks


### 5. Using a SQL database file as your data source: SQLite

**SQLite** is a C library that provides a lightweight disk-based database that doesn’t require a 
separate server process and allows accessing the database using a nonstandard variant of the SQL query language.

In [13]:
import sqlite3
conn = sqlite3.connect('stocks.db')

In [14]:
df = pd.read_sql_query("select * from stocks", conn)

In [15]:
df

Unnamed: 0,date,trans,symbol,qty,price
0,2006-01-05,BUY,RHAT,100.0,35.14
1,2006-03-28,BUY,IBM,1000.0,45.0
2,2006-04-05,BUY,MSFT,1000.0,72.0
3,2006-04-06,SELL,IBM,500.0,53.0



### 6. Using a real SQL database: PostgreSQL

(**This part will not work on your pc unless you have a PostgreSQL database fully configure with the right data**)

**PostgreSQL** is a free an open source relational database management system emphazing extensibility and SQL compliance.

**psycopg2** is the most popular PostgreSQL database adapter for Python.


In [16]:
import psycopg2

In [18]:
con = psycopg2.connect(database='dvdrental', user='postgres', password="a", host="127.0.0.1", port="5432")

In [19]:
cur = con.cursor()
cur.execute("select * from actor")
rows = cur.fetchall()
data = []
for row in rows:
    data.append(row)
pd.DataFrame(data)

Unnamed: 0,0,1,2,3
0,1,Penelope,Guiness,2013-05-26 14:47:57.620
1,2,Nick,Wahlberg,2013-05-26 14:47:57.620
2,3,Ed,Chase,2013-05-26 14:47:57.620
3,4,Jennifer,Davis,2013-05-26 14:47:57.620
4,5,Johnny,Lollobrigida,2013-05-26 14:47:57.620
...,...,...,...,...
195,196,Bela,Walken,2013-05-26 14:47:57.620
196,197,Reese,West,2013-05-26 14:47:57.620
197,198,Mary,Keitel,2013-05-26 14:47:57.620
198,199,Julia,Fawcett,2013-05-26 14:47:57.620


### 7. Using an Application Programming Interface (API)

APIs within a program are set of standards which permit outside software systems to request information from he original program

For this example we are going to use the free API of **finnhub.io**


In [20]:
# Looking at the api documentation and leveraging the data

import requests
r = requests.get('https://finnhub.io/api/v1/stock/metric?symbol=AAPL&metric=all&token=')
print(r.json())

{'metric': {'10DayAverageTradingVolume': 46.18057, '13WeekPriceReturnDaily': 43.31087, '26WeekPriceReturnDaily': 38.87761, '3MonthAverageTradingVolume': 763.06617, '52WeekHigh': 457.65, '52WeekHighDate': '2020-08-06', '52WeekLow': 199.15, '52WeekLowDate': '2019-08-12', '52WeekPriceReturnDaily': 118.4781, '5DayPriceReturnDaily': 4.56663, 'assetTurnoverAnnual': 0.73888, 'assetTurnoverTTM': 0.85636, 'beta': 1.22156, 'bookValuePerShareAnnual': 20.36534, 'bookValuePerShareQuarterly': 16.87279, 'bookValueShareGrowth5Y': 1.38123, 'capitalSpendingGrowth5Y': 1.35289, 'cashFlowPerShareAnnual': 14.5847, 'cashFlowPerShareTTM': 15.77877, 'cashPerSharePerShareAnnual': 22.63148, 'cashPerSharePerShareQuarterly': 21.71483, 'currentDividendYieldTTM': 0.70424, 'currentEv/freeCashFlowAnnual': 44.17318, 'currentEv/freeCashFlowTTM': 40.34478, 'currentRatioAnnual': 1.54013, 'currentRatioQuarterly': 1.46945, 'dividendGrowthRate5Y': 11.2299, 'dividendPerShare5Y': 2.456, 'dividendPerShareAnnual': 3, 'dividendYi

## Converting back to a CSV, JSON, SQL

So after extensive work on cleaning your data, you’re now ready to save it as a file of your choice. Similar to the ways we read in data, pandas provides intuitive commands to save it:

In [21]:
df

Unnamed: 0,date,trans,symbol,qty,price
0,2006-01-05,BUY,RHAT,100.0,35.14
1,2006-03-28,BUY,IBM,1000.0,45.0
2,2006-04-05,BUY,MSFT,1000.0,72.0
3,2006-04-06,SELL,IBM,500.0,53.0


## Basic DataFrame operations

DataFrames possess hundreds of methods and other operations that are crucial to any analysis. As a beginner, you should know the operations that perform simple transformations of your data and those that provide fundamental statistical analysis.

In [None]:
### Viewing your data
