Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and operations for manipulating numerical tables and time series.

In [1]:
# This"pd" is the preferable abbreviation for pandas
import pandas as pd
import numpy as np

## 1. Creating dataframes

#### DataFrame
A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column. We are using the pd.DataFrame() constructor to generate these DataFrame objects.
For example, consider the following simple DataFrame:

In [2]:
df=pd.DataFrame(np.arange(0,20).reshape(5,4),index=['Row1','Row2','Row3','Row4','Row5'],columns=["Column1","Column2","Column3","Column4"])

In [3]:
df.head()#index is used to give row labels

Unnamed: 0,Column1,Column2,Column3,Column4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


DataFrame entries are not limited to integers. For instance, here's a DataFrame whose values are strings:

In [4]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']},index=["Restaurant A","Restaurant B"])

Unnamed: 0,Bob,Sue
Restaurant A,I liked it.,Pretty good.
Restaurant B,It was awful.,Bland.


#### Series 
A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. 

In [5]:
pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

#### Now let's make a data frame and save it as a CSV file.

In [6]:
user_data={
    'MarksA':np.random.randint(1,100,5), 
    'MarksB':np.random.randint(50,100,5),
    'MarksC':np.random.randint(20,80,5)
}

In [7]:
user_data

{'MarksA': array([20, 91, 97, 35, 43]),
 'MarksB': array([95, 69, 74, 57, 92]),
 'MarksC': array([58, 33, 26, 59, 60])}

In [8]:
df2=pd.DataFrame(user_data)
print(df2)

   MarksA  MarksB  MarksC
0      20      95      58
1      91      69      33
2      97      74      26
3      35      57      59
4      43      92      60


In [9]:
df2.to_csv('Test-2')#this will create a CSV file named Test-2

### 2. Accessing data

In [10]:
df

Unnamed: 0,Column1,Column2,Column3,Column4
Row1,0,1,2,3
Row2,4,5,6,7
Row3,8,9,10,11
Row4,12,13,14,15
Row5,16,17,18,19


#### iloc and loc
There exists two methods loc and iloc that are index based to access data

In [11]:
df.loc['Row1'] #accessing row 1 and its columns

Column1    0
Column2    1
Column3    2
Column4    3
Name: Row1, dtype: int32

In [12]:
df.loc[:, ['Column1', 'Column4']] # accessing all rows for columns 1 and 4

Unnamed: 0,Column1,Column4
Row1,0,3
Row2,4,7
Row3,8,11
Row4,12,15
Row5,16,19


In [13]:
type(df.loc['Row1'])# one column/row = Series

pandas.core.series.Series

In [15]:
df.iloc[0:3,0:2] #accessing 0th row to 3rd(not included) row

Unnamed: 0,Column1,Column2
Row1,0,1
Row2,4,5
Row3,8,9


In [16]:
df.iloc[0,0:2]

Column1    0
Column2    1
Name: Row1, dtype: int32

In [17]:
df.iloc[[0, 1, 2], 0] #passing list

Row1    0
Row2    4
Row3    8
Name: Column1, dtype: int32

#### Native Accessor can also be used

In [18]:
df['Column3']

Row1     2
Row2     6
Row3    10
Row4    14
Row5    18
Name: Column3, dtype: int32

In [19]:
df.Column2

Row1     1
Row2     5
Row3     9
Row4    13
Row5    17
Name: Column2, dtype: int32

In [20]:
df[['Column3','Column4']]

Unnamed: 0,Column3,Column4
Row1,2,3
Row2,6,7
Row3,10,11
Row4,14,15
Row5,18,19


In [22]:
df['Column1'][1]

4

In [23]:
#convert Dataframes into array to perform mathematicsl operations we use .values
df.iloc[:,1:].values

array([[ 1,  2,  3],
       [ 5,  6,  7],
       [ 9, 10, 11],
       [13, 14, 15],
       [17, 18, 19]])

In [24]:
df['Column1'].value_counts()

0     1
8     1
4     1
16    1
12    1
Name: Column1, dtype: int64

In [25]:
df['Column1'].sum()

40

## 3.Reading Data Files

In [26]:
wine_reviews = pd.read_csv('winedata.csv')#mention the path of CSV file you want to read

In [27]:
wine_reviews.shape #gives (rows,columns) number

(150930, 11)

In [28]:
wine_reviews.head(n=7)#see data upto n rows, default n is 5

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,1,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
2,2,US,Mac Watson honors the memory of a wine once ma...,Special Selected Late Harvest,96,90.0,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley
3,3,US,"This spent 20 months in 30% new French oak, an...",Reserve,96,65.0,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi
4,4,France,"This is the top wine from La Bégude, named aft...",La Brûlade,95,66.0,Provence,Bandol,,Provence red blend,Domaine de la Bégude
5,5,Spain,"Deep, dense and pure from the opening bell, th...",Numanthia,95,73.0,Northern Spain,Toro,,Tinta de Toro,Numanthia
6,6,Spain,Slightly gritty black-fruit aromas include a s...,San Román,95,65.0,Northern Spain,Toro,,Tinta de Toro,Maurodos


## 4. Analysing the data

In [31]:
wine_reviews.points.describe()#This method generates a high-level summary of the attributes of the given column.

count    150930.000000
mean         87.888418
std           3.222392
min          80.000000
25%          86.000000
50%          88.000000
75%          90.000000
max         100.000000
Name: points, dtype: float64

In [32]:
wine_reviews.points.mean()

87.8884184721394

In [34]:
wine_reviews.winery.unique()#To see a list of unique values

array(['Heitz', 'Bodega Carmen Rodríguez', 'Macauley', ..., 'Screwed',
       'Red Bucket', 'White Knot'], dtype=object)

In [35]:
wine_reviews.winery.value_counts()#To see a list of unique values and how often they occur in the dataset

Williams Selyem          374
Testarossa               274
DFJ Vinhos               258
Chateau Ste. Michelle    225
Columbia Crest           217
                        ... 
Vuina                      1
Domaine des Rochers        1
Chesler                    1
Stellekaya                 1
Domaine Trois Frères       1
Name: winery, Length: 14810, dtype: int64

### Map Function

In [39]:
wine_review_points_mean = wine_reviews.points.mean()
wine_reviews.points.map(lambda p: p - wine_review_points_mean)

0         8.111582
1         8.111582
2         8.111582
3         8.111582
4         7.111582
            ...   
150925    3.111582
150926    3.111582
150927    3.111582
150928    2.111582
150929    2.111582
Name: points, Length: 150930, dtype: float64

In [40]:
def remean_points(row):
    row.points = row.points - wine_review_points_mean
    return row

wine_reviews.apply(remean_points, axis='columns')

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,8.111582,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,1,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,8.111582,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
2,2,US,Mac Watson honors the memory of a wine once ma...,Special Selected Late Harvest,8.111582,90.0,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley
3,3,US,"This spent 20 months in 30% new French oak, an...",Reserve,8.111582,65.0,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi
4,4,France,"This is the top wine from La Bégude, named aft...",La Brûlade,7.111582,66.0,Provence,Bandol,,Provence red blend,Domaine de la Bégude
...,...,...,...,...,...,...,...,...,...,...,...
150925,150925,Italy,Many people feel Fiano represents southern Ita...,,3.111582,20.0,Southern Italy,Fiano di Avellino,,White Blend,Feudi di San Gregorio
150926,150926,France,"Offers an intriguing nose with ginger, lime an...",Cuvée Prestige,3.111582,27.0,Champagne,Champagne,,Champagne Blend,H.Germain
150927,150927,Italy,This classic example comes from a cru vineyard...,Terre di Dora,3.111582,20.0,Southern Italy,Fiano di Avellino,,White Blend,Terredora
150928,150928,France,"A perfect salmon shade, with scents of peaches...",Grand Brut Rosé,2.111582,52.0,Champagne,Champagne,,Champagne Blend,Gosset


In [42]:
wine_reviews[wine_reviews['price']>100] #getting data satisfying certain conditions

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
1,1,Spain,"Ripe aromas of fig, blackberry and cassis are ...",Carodorum Selección Especial Reserva,96,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
7,7,Spain,Lush cedary black-fruit aromas are luxe and of...,Carodorum Único Crianza,95,110.0,Northern Spain,Toro,,Tinta de Toro,Bodega Carmen Rodríguez
14,14,US,"With its sophisticated mix of mineral, acid an...",Grace Vineyard,95,185.0,Oregon,Dundee Hills,Willamette Valley,Pinot Noir,Domaine Serene
16,16,US,"This blockbuster, powerhouse of a wine suggest...",Rainin Vineyard,95,325.0,California,Diamond Mountain District,Napa,Cabernet Sauvignon,Hall
...,...,...,...,...,...,...,...,...,...,...,...
149240,149240,Spain,With its dried-cherry and rose-petal aromas an...,Faustino I Rioja Gran Reserva,94,185.0,Northern Spain,Rioja,,Tempranillo,Bodegas Faustino
149247,149247,Spain,"Beautifully balanced, this shows all the posit...",Faustino I Rioja Gran Reserva,92,125.0,Northern Spain,Rioja,,Tempranillo,Bodegas Faustino
149471,149471,France,"This is a solid, powerful wine packed with tan...",,91,115.0,Bordeaux,Saint-Julien,,Bordeaux-style Red Blend,Château Ducru Beaucaillou
149631,149631,Portugal,"This is a very old 40-year old, with some hars...",40-year old tawny,84,130.0,Port,,,Port,Poças


### 5.Grouping and sorting 

In [43]:
wine_reviews.groupby('points').points.count()

points
80       898
81      1502
82      4041
83      6048
84     10708
85     12411
86     15573
87     20747
88     17871
89     12921
90     15973
91     10536
92      9241
93      6017
94      3462
95      1716
96       695
97       365
98       131
99        50
100       24
Name: points, dtype: int64

In [44]:
wine_reviews.groupby('points').price.min()#cheapest wine

points
80      5.0
81      5.0
82      5.0
83      4.0
84      4.0
85      4.0
86      4.0
87      6.0
88      6.0
89      7.0
90      5.0
91      8.0
92     11.0
93     12.0
94     15.0
95     20.0
96     20.0
97     42.0
98     50.0
99     65.0
100    65.0
Name: price, dtype: float64

In [45]:
wine_reviews.groupby(['country', 'province']).apply(lambda df: df.loc[df.points.idxmax()])#max points in that axis

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
country,province,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Albania,Mirditë,4642,Albania,This garnet-colored wine made from 100% Kallme...,,88,20.0,Mirditë,,,Kallmet,Arbëri
Argentina,Mendoza Province,65331,Argentina,"If the color doesn't tell the full story, the ...",Nicasia Vineyard,97,120.0,Mendoza Province,Mendoza,,Malbec,Bodega Catena Zapata
Argentina,Other,10619,Argentina,"Take note, this could be the best wine Colomé ...",Reserva,95,90.0,Other,Salta,,Malbec,Colomé
Australia,Australia Other,68251,Australia,This big wine presents a sophisticated bouquet...,Yattarna,92,65.0,Australia Other,South Eastern Australia,,Chardonnay,Penfolds
Australia,New South Wales,54205,Australia,"This wine's deep brassy color suggests honey, ...",Noble One Botrytis,93,32.0,New South Wales,New South Wales,,Sémillon,De Bortoli
...,...,...,...,...,...,...,...,...,...,...,...,...
Uruguay,Juanico,3160,Uruguay,This mature Bordeaux-style blend is earthy on ...,Preludio Barrel Select Lote N 77,90,45.0,Juanico,,,Red Blend,Familia Deicas
Uruguay,Montevideo,3164,Uruguay,"Bouza ranks as one of Uruguay's top wineries, ...",Monte Vide Eu Tannat-Merlot-Tempranillo,90,57.0,Montevideo,,,Red Blend,Bouza
Uruguay,Progreso,6541,Uruguay,Blackberry and plum aromas come with wood-smok...,RPF,89,23.0,Progreso,,,Tannat,Pisano
Uruguay,San Jose,70157,Uruguay,While this ranks as one of the best Uruguayan ...,El Preciado Premier Gran Reserva,89,60.0,San Jose,,,Red Blend,Castillo Viejo


In [50]:
countries_reviewed = wine_reviews.groupby(['country', 'province']).description.agg([len])
countries_reviewed

Unnamed: 0_level_0,Unnamed: 1_level_0,len
country,province,Unnamed: 2_level_1
Albania,Mirditë,2
Argentina,Mendoza Province,4742
Argentina,Other,889
Australia,Australia Other,553
Australia,New South Wales,246
...,...,...
Uruguay,Juanico,19
Uruguay,Montevideo,3
Uruguay,Progreso,5
Uruguay,San Jose,15


In [51]:
countries_reviewed.sort_values(by='len', ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,len
country,province,Unnamed: 2_level_1
US,California,44508
US,Washington,9750
Italy,Tuscany,7281
France,Bordeaux,6111
Spain,Northern Spain,4892
...,...,...
Switzerland,Neuchâtel,1
Switzerland,Ticino,1
Switzerland,Valais,1
Switzerland,Vino da Tavola della Svizzera Italiana,1


In [52]:
countries_reviewed.sort_values(by=['country', 'len'])

Unnamed: 0_level_0,Unnamed: 1_level_0,len
country,province,Unnamed: 2_level_1
Albania,Mirditë,2
Argentina,Other,889
Argentina,Mendoza Province,4742
Australia,Queensland,3
Australia,Tasmania,47
...,...,...
Uruguay,Colonia,6
Uruguay,San Jose,15
Uruguay,Uruguay,18
Uruguay,Canelones,19


### 6.Handling Missing Values

In [53]:
wine_reviews[pd.isnull(wine_reviews.country)]#checking and seeing if it has missing values

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
1133,1133,,Delicate white flowers and a spin of lemon pee...,Askitikos,90,17.0,,,,Assyrtiko,Tsililis
1440,1440,,"A blend of 60% Syrah, 30% Cabernet Sauvignon a...",Shah,90,30.0,,,,Red Blend,Büyülübağ
68226,68226,,"From first sniff to last, the nose never makes...",Piedra Feliz,81,15.0,,,,Pinot Noir,Chilcas
113016,113016,,"From first sniff to last, the nose never makes...",Piedra Feliz,81,15.0,,,,Pinot Noir,Chilcas
135696,135696,,"From first sniff to last, the nose never makes...",Piedra Feliz,81,15.0,,,,Pinot Noir,Chilcas


we have two options either we can drop the NaN values or replace them.

In [54]:
wine_reviews.region_2.fillna("Unknown") #this will replace all missing values in region2 with Unknown

0                      Napa
1                   Unknown
2                    Sonoma
3         Willamette Valley
4                   Unknown
                ...        
150925              Unknown
150926              Unknown
150927              Unknown
150928              Unknown
150929              Unknown
Name: region_2, Length: 150930, dtype: object

In [55]:
wine_reviews.dropna(axis=0)#searches in columns and deletes all the NaN Values

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
2,2,US,Mac Watson honors the memory of a wine once ma...,Special Selected Late Harvest,96,90.0,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley
3,3,US,"This spent 20 months in 30% new French oak, an...",Reserve,96,65.0,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi
8,8,US,This re-named vineyard was formerly bottled as...,Silice,95,65.0,Oregon,Chehalem Mountains,Willamette Valley,Pinot Noir,Bergström
9,9,US,The producer sources from two blocks of the vi...,Gap's Crown Vineyard,95,60.0,California,Sonoma Coast,Sonoma,Pinot Noir,Blue Farm
...,...,...,...,...,...,...,...,...,...,...,...
150889,150889,US,A bizarre style of wine. The aromas are Port-l...,Lafond Vineyard,82,35.0,California,Santa Ynez Valley,Central Coast,Pinot Noir,Lafond
150892,150892,US,"A light, earthy wine, with violet, berry and t...",Coastal,82,10.0,California,California,California Other,Merlot,Callaway
150914,150914,US,"Old-gold in color, and thick and syrupy. The a...",Late Harvest Cluster Select,94,25.0,California,Anderson Valley,Mendocino/Lake Counties,White Riesling,Navarro
150915,150915,US,"Decades ago, Beringer’s then-winemaker Myron N...",Nightingale,93,30.0,California,North Coast,North Coast,White Blend,Beringer


## Working with CSV

### 1. Creating our own dataset and working on it as CSV

In [56]:
from io import StringIO, BytesIO

In [57]:
data = ('col1,col2,col3\n'
            'x,y,1\n'
            'a,b,2\n'
            'c,d,3')

In [60]:
df=pd.read_csv(StringIO(data)) #reads our data as a CSV
df

Unnamed: 0,col1,col2,col3
0,x,y,1
1,a,b,2
2,c,d,3


### 2. Reading a URL to CSV using pandas

In [63]:
df=pd.read_csv('https://download.bls.gov/pub/time.series/cu/cu.item',
                 sep='\t')

In [64]:
df.head()

Unnamed: 0,item_code,item_name,display_level,selectable,sort_sequence
0,AA0,All items - old base,0,T,2
1,AA0R,Purchasing power of the consumer dollar - old ...,0,T,399
2,SA0,All items,0,T,1
3,SA0E,Energy,1,T,374
4,SA0L1,All items less food,1,T,358


### 3. Reading a JSON to CSV using pandas

In [65]:
Data = '{"employee_name": "James", "email": "james@gmail.com", "job_profile": [{"title1":"Team Lead", "title2":"Sr. Developer"}]}'
pd.read_json(Data)

Unnamed: 0,employee_name,email,job_profile
0,James,james@gmail.com,"{'title1': 'Team Lead', 'title2': 'Sr. Develop..."


### 4. Reading a HTML to CSV using pandas

In [66]:
#widely used in webscraping
url = 'https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/index.html'
dfs = pd.read_html(url) #reading tables - webscraping technique

In [67]:
dfs[0] #this is a list of tables- so i am accessing the first table right now

Unnamed: 0,Bank NameBank,CityCity,StateSt,CertCert,Acquiring InstitutionAI,Closing DateClosing,FundFund
0,Almena State Bank,Almena,KS,15426,Equity Bank,"October 23, 2020",10538
1,First City Bank of Florida,Fort Walton Beach,FL,16748,"United Fidelity Bank, fsb","October 16, 2020",10537
2,The First State Bank,Barboursville,WV,14361,"MVB Bank, Inc.","April 3, 2020",10536
3,Ericson State Bank,Ericson,NE,18265,Farmers and Merchants Bank,"February 14, 2020",10535
4,City National Bank of New Jersey,Newark,NJ,21111,Industrial Bank,"November 1, 2019",10534
...,...,...,...,...,...,...,...
558,"Superior Bank, FSB",Hinsdale,IL,32646,"Superior Federal, FSB","July 27, 2001",6004
559,Malta National Bank,Malta,OH,6629,North Valley Bank,"May 3, 2001",4648
560,First Alliance Bank & Trust Co.,Manchester,NH,34264,Southern New Hampshire Bank & Trust,"February 2, 2001",4647
561,National State Bank of Metropolis,Metropolis,IL,3815,Banterra Bank of Marion,"December 14, 2000",4646


#### 5. Reading excel file

In [69]:
df_excel=pd.read_excel('file.xlsx')

In [70]:
df_excel

Unnamed: 0,name,rollno,class
0,vanshika,23,6
1,vinay,32,5
2,nutan,22,4
