### What is Pandas

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.

https://pandas.pydata.org/about/index.html

### Pandas Series

A Pandas Series is like a column in a table. It is a 1-D array holding data of any type.

### Importing Pandas

In [51]:
import numpy as pd  
import pandas as pd  

### Series from lists

In [52]:
# string
country = ['India','Pakistan','USA','Nepal','Srilanka']

pd.Series(country)

0       India
1    Pakistan
2         USA
3       Nepal
4    Srilanka
dtype: object

In [53]:
# integers
runs = [13,24,56,78,100]

runs_ser = pd.Series(runs)

In [54]:
# custom index
marks = [67,57,89,100]
subjects = ['maths','english','science','hindi']

pd.Series(marks,index=subjects)

maths       67
english     57
science     89
hindi      100
dtype: int64

In [55]:
# setting a name
marks = pd.Series(marks,index=subjects,name='Nitish ke marks')
marks

maths       67
english     57
science     89
hindi      100
Name: Nitish ke marks, dtype: int64

### Series from dict

In [56]:
marks = {
    'maths':67,
    'english':57,
    'science':89,
    'hindi':100
}

marks_series = pd.Series(marks,name='nitish ke marks')
marks_series

maths       67
english     57
science     89
hindi      100
Name: nitish ke marks, dtype: int64

### Series Attributes

In [57]:
# size
marks_series.size

4

In [58]:
# dtype
marks_series.dtype

dtype('int64')

In [59]:
# name
marks_series.name

'nitish ke marks'

In [60]:
# is_unique
marks_series.is_unique

pd.Series([1,1,2,3,4,5]).is_unique

False

In [61]:
# index
marks_series.index

Index(['maths', 'english', 'science', 'hindi'], dtype='object')

In [62]:
runs_ser.index

RangeIndex(start=0, stop=5, step=1)

In [63]:
# values
marks_series.values

array([ 67,  57,  89, 100])

### Series using read_csv

In [64]:
# with one col
subs = pd.read_csv('subs.csv')
subs

Unnamed: 0,Subscribers gained
0,48
1,57
2,40
3,43
4,44
...,...
360,231
361,226
362,155
363,144


In [65]:
# with 2 cols
# vk = pd.read_csv('kohli_ipl.csv',index_col='match_no',squeeze=True)
vk = pd.read_csv('kohli_ipl.csv',index_col='match_no')
vk

Unnamed: 0_level_0,runs
match_no,Unnamed: 1_level_1
1,1
2,23
3,13
4,12
5,1
...,...
211,0
212,20
213,73
214,25


In [66]:
# movies = pd.read_csv('bollywood.csv',index_col='movie',squeeze=True)
movies = pd.read_csv('bollywood.csv',index_col='movie')
movies

Unnamed: 0_level_0,lead
movie,Unnamed: 1_level_1
Uri: The Surgical Strike,Vicky Kaushal
Battalion 609,Vicky Ahuja
The Accidental Prime Minister (film),Anupam Kher
Why Cheat India,Emraan Hashmi
Evening Shadows,Mona Ambegaonkar
...,...
Hum Tumhare Hain Sanam,Shah Rukh Khan
Aankhen (2002 film),Amitabh Bachchan
Saathiya (film),Vivek Oberoi
Company (film),Ajay Devgn


In [67]:
# head and tail
subs.head()

Unnamed: 0,Subscribers gained
0,48
1,57
2,40
3,43
4,44


In [68]:
vk.head(3)

Unnamed: 0_level_0,runs
match_no,Unnamed: 1_level_1
1,1
2,23
3,13


In [69]:
vk.tail(10)

Unnamed: 0_level_0,runs
match_no,Unnamed: 1_level_1
206,0
207,0
208,9
209,58
210,30
211,0
212,20
213,73
214,25
215,7


In [70]:
# sample
movies.sample(5)

Unnamed: 0_level_0,lead
movie,Unnamed: 1_level_1
Balwinder Singh Famous Ho Gaya,Asrani
Gabbar Is Back,Akshay Kumar
Love Story 2050,Harman Baweja
Running Shaadi,Arsh Bajwa
Delhi-6,Waheeda Rehman


In [71]:
# value_counts -> movies
movies.value_counts()

lead            
Akshay Kumar        48
Amitabh Bachchan    45
Ajay Devgn          38
Salman Khan         31
Sanjay Dutt         26
                    ..
Aashish Bhatt        1
Abhimanyu Dasani     1
Abhishek Bharate     1
Adhvik Mahajan       1
Aditya Narayan       1
Name: count, Length: 566, dtype: int64

In [75]:
# sort_values -> inplace
vk.sort_values(by=['lead'],ascending=False).head(1).values[0]

KeyError: 'lead'

In [73]:
vk.sort_values(ascending=False)

TypeError: DataFrame.sort_values() missing 1 required positional argument: 'by'

In [77]:
# sort_index -> inplace -> movies
movies.sort_index(ascending=False,inplace=True)

In [78]:
vk.sort_values(inplace=True)

TypeError: DataFrame.sort_values() missing 1 required positional argument: 'by'