# Pandas

Pandas Library has three main data structures called:
1. Series
2. DataFrame
3. Panel

## Series

Series is 1D Numpy array under the hood. It is coupled with array of labels.

`import pandas as pd

ser = pd.Series(data, index=idx)`

where data can be an ndarray, a Python dictionary or a scalar value.

In [2]:
import numpy as np
np.random.seed(100)
ser=pd.Series(np.random.rand(7))
ser

0    0.543405
1    0.278369
2    0.424518
3    0.844776
4    0.004719
5    0.121569
6    0.670749
dtype: float64

In [4]:
import calendar as cal
monthNames = [cal.month_name[i] for i in np.arange(1,6)]
months = pd.Series(np.arange(1,6), index=monthNames)
months

January     1
February    2
March       3
April       4
May         5
dtype: int64

In [5]:
months.index

Index([u'January', u'February', u'March', u'April', u'May'], dtype='object')

If data is a dictionary and index is provided, the labels will be constructed from it.

In [6]:
currDict = {'US': 'dollar', 'India': 'rupees', 'UK': 'pound', 'Germany': 'euro'}
currSeries = pd.Series(currDict)
currSeries

Germany      euro
India      rupees
UK          pound
US         dollar
dtype: object

In [8]:
stockPrices = {'GOOG': 11.23, 'FB': 62.53, 'TWTR': 64.50, 'AMZN': 358.6}
stockPriceSeries = pd.Series(stockPrices, index=['GOOG', 'FB', 'YHOO', 'TWTR', 'AMZN'], name='stockPrices')
stockPriceSeries

GOOG     11.23
FB       62.53
YHOO       NaN
TWTR     64.50
AMZN    358.60
Name: stockPrices, dtype: float64

The name attribute is useful in tasks such as combining Series objects into a DataFrame structure.

In [10]:
dogSeries = pd.Series('chihuahua', index=['breed', 'countryOfOrigin', 'name', 'gender'])
dogSeries

breed              chihuahua
countryOfOrigin    chihuahua
name               chihuahua
gender             chihuahua
dtype: object

In [14]:
dogSeries = pd.Series('pekingese')
dogSeries

0    pekingese
dtype: object

In [15]:
type(dogSeries)

pandas.core.series.Series

### Operations on Series

In [16]:
currDict

{'Germany': 'euro', 'India': 'rupees', 'UK': 'pound', 'US': 'dollar'}

In [17]:
currDict['India']

'rupees'

In [18]:
stockPriceSeries

GOOG     11.23
FB       62.53
YHOO       NaN
TWTR     64.50
AMZN    358.60
Name: stockPrices, dtype: float64

In [19]:
stockPriceSeries['MSFT']

KeyError: 'MSFT'

In [21]:
stockPriceSeries.get('MSFT', np.NaN)  
# If key doesn't exist, return NaN

nan

### Slicing

In [24]:
stockPriceSeries[:2]

GOOG    11.23
FB      62.53
Name: stockPrices, dtype: float64

In [25]:
stockPriceSeries[stockPriceSeries > 100]

AMZN    358.6
Name: stockPrices, dtype: float64

In [26]:
np.mean(stockPriceSeries)

124.215

In [27]:
np.std(stockPriceSeries)

136.99713400286885

In [28]:
ser

0    0.543405
1    0.278369
2    0.424518
3    0.844776
4    0.004719
5    0.121569
6    0.670749
dtype: float64

In [29]:
ser * ser

0    0.295289
1    0.077490
2    0.180215
3    0.713647
4    0.000022
5    0.014779
6    0.449904
dtype: float64

In [30]:
np.sqrt(ser)

0    0.737160
1    0.527607
2    0.651550
3    0.919117
4    0.068694
5    0.348668
6    0.818993
dtype: float64

In [31]:
ser[4:]

4    0.004719
5    0.121569
6    0.670749
dtype: float64

In [32]:
ser[1:] + ser[:-2]

0         NaN
1    0.556739
2    0.849035
3    1.689552
4    0.009438
5         NaN
6         NaN
dtype: float64

## DataFrame

2D labeled array. Its column types can be heterogeneous

It is conceptually similar to tables. Columns can be heterogeneous. A DataFrame column is a Series. It can be though of as a dictionary of Series structures. Its size is mutable, columns can be inserted and deleted. Indexes are needed for fast lookups as well as proper aligning and joining of data in pandas.

In [33]:
stocks = {
    'AMZN': pd.Series([346.15, 0.59, 459, 0.52, 589.8, 158.88],
    index=['Closing price', 'EPS', 'Shares Outstanding', 'Beta', 'P/E', 'Market Cap']),
    'GOOG': pd.Series([1133.43, 36.05, 335.83, 0.87, 31.44, 380.64],
                        index=['Closing price', 'EPS', 'Shares Outstanding', 'Beta', 'P/E', 'Market Cap']),
    'FB': pd.Series([61.48, 0.59, 2450, 104.93, 150.92],
                   index=['Closing price', 'EPS', 'Shares Outstanding', 'P/E', 'Market Cap']),
    'YHOO': pd.Series([34.90, 1.27, 1010, 27.48, 0.66, 35.26],
                     index=['Closing price', 'EPS', 'Shares Outstanding', 'P/E', 'Beta', 'Market Cap'])
}

stockDF = pd.DataFrame(stocks)
stockDF

Unnamed: 0,AMZN,FB,GOOG,YHOO
Beta,0.52,,0.87,0.66
Closing price,346.15,61.48,1133.43,34.9
EPS,0.59,0.59,36.05,1.27
Market Cap,158.88,150.92,380.64,35.26
P/E,589.8,104.93,31.44,27.48
Shares Outstanding,459.0,2450.0,335.83,1010.0


In [35]:
stockDF = pd.DataFrame(stocks, index=['Closing price', 'EPS', 'Sahres Outstanding', 'P/E'])
stockDF

Unnamed: 0,AMZN,FB,GOOG,YHOO
Closing price,346.15,61.48,1133.43,34.9
EPS,0.59,0.59,36.05,1.27
Sahres Outstanding,,,,
P/E,589.8,104.93,31.44,27.48


In [53]:
stockDF = pd.DataFrame(stocks, index=['Closing price', 'EPS', 'Shares Outstanding', 'P/E', 'Market Cap', 'Beta'],
                      columns=['FB', 'TWTR', 'SCNW'])
stockDF

Unnamed: 0,FB,TWTR,SCNW
Closing price,61.48,,
EPS,0.59,,
Shares Outstanding,2450.0,,
P/E,104.93,,
Market Cap,150.92,,
Beta,,,


In [45]:
stockDF.index

Index([u'Closing price', u'EPS', u'Shares Outstanding', u'P/E', u'Market Cap',
       u'Beta'],
      dtype='object')

In [46]:
stockDF.columns

Index([u'FB', u'TWTR', u'SCNW'], dtype='object')

In [47]:
algos = {
    'search': ['DFS', 'BFS', 'Binary Search', 'Linear', 'ShortestPath (Djikstra)'],
    'sorting': ['Quicksort', 'Mergesort', 'Heapsort', 'Bubble sort', 'Insertion sort'],
    'machine learning': ['RandomForest', 'K Nearest Neighbor', 'Logistinc Regression', 'K-Means Clustering', 'Linear Regression']
}

algoDF = pd.DataFrame(algos)
algoDF

Unnamed: 0,machine learning,search,sorting
0,RandomForest,DFS,Quicksort
1,K Nearest Neighbor,BFS,Mergesort
2,Logistinc Regression,Binary Search,Heapsort
3,K-Means Clustering,Linear,Bubble sort
4,Linear Regression,ShortestPath (Djikstra),Insertion sort


In [48]:
pd.DataFrame(algos, index=['algo_1', 'algo_2', 'algo_3', 'algo_4', 'algo_5'])

Unnamed: 0,machine learning,search,sorting
algo_1,RandomForest,DFS,Quicksort
algo_2,K Nearest Neighbor,BFS,Mergesort
algo_3,Logistinc Regression,Binary Search,Heapsort
algo_4,K-Means Clustering,Linear,Bubble sort
algo_5,Linear Regression,ShortestPath (Djikstra),Insertion sort


In [55]:
algoDF['machine learning', 'search']

KeyError: ('machine learning', 'search')