### What is Pandas

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool,
built on top of the Python programming language.

https://pandas.pydata.org/about/index.html

### Pandas Series

A Pandas Series is like a column in a table. It is a 1-D array holding data of any type.

### Importing Pandas
- It is always recommand when we import pandas, we will import numpy as well.

In [None]:
import pandas as pd 
import numpy as np

### Series from lists
1. Using by default index.
2. Using custom index.

#### 1. Using by default index.

In [None]:
# string.  
# dtype in pandas series represent the type of data series contain.
countries = pd.Series(['India', 'Australia', 'USA', 'UK', 'Germany', 'Kuwait'])
countries

In [None]:
# integers. 
runs = [23, 45, 67, 98, 40, 30]
runs_ser = pd.Series(runs)
runs_ser

#### 1. Using by Custom index.

In [None]:
# Example 1

runs = [23, 45, 67, 98, 40, 30]

runs_series = pd.Series(runs, index = range(1, len(runs) + 1))
runs_series

In [None]:
# Example 2 

marks = [67,57,89,100,90]
subjects = ['Maths', 'English', 'Physics', 'Chemistry', 'Hindi']

pd.Series(marks, index = subjects)

In [None]:
# setting a name of the series. 
marks_series = pd.Series(marks, index=subjects, name='I dont know kis Hosiyar k marks hai ye')

marks_series

### Series from dict

In [None]:
# here it will consider the key as the index in the series. 

marks = {
    'Maths' : 67,
    'English' : 57,
    'Physics' : 89,
    'Chemistry' : 100,
    'Hindi' : 100
}

pd.Series(marks)

### Series Attributes.
1. size : return the number of rows(including the missing values).
2. dtype : return the datatype of the values that series contain.
3. name : return the name of the series
4. is_unique : return True/False on the basis of is whether series contain unique value or not. 
5. index : return all the index present inside the series. 
6. values : return all the values of the series.

In [None]:
# size
marks_series.size

In [None]:
# dtype. 
marks_series.dtype

In [None]:
# name 
marks_series.name

In [None]:
# is_unique

print(marks_series.is_unique)
print(pd.Series([1, 1, 2, 2, 3, 4, 5], index=range(1, 8)).is_unique)

In [None]:
# index 
marks_series.index

In [None]:
# values. 
marks_series.values

### Series using read_csv

In [None]:
# with one column. 
subs = pd.read_csv('./Datasets/subs.csv').squeeze()
subs

In [None]:
# with two column
vk = pd.read_csv('./Datasets/kohli_ipl.csv', index_col='match_no').squeeze()
vk

In [None]:
# two columns Series.

movies = pd.read_csv('./Datasets/bollywood.csv', index_col='movie').squeeze()
movies

### Series methods.
1. **head** : return first five rows of the dataset(we can change its default behavior).
2. **tail** : return the last five rows from the dataset(we can change its default behavior).
3. **sample** : return one random row from the dataset(we can change its default behavior).
4. **value_count**s : return the count of the each value.
5. **sort_values** : sort the series according to the values(we can do it inplace also).
6. **sort_index** : sort the series according to the index(we can do it inplace also). 

In [None]:
# head function. 
movies.head()

In [None]:
vk.head(3)

In [None]:
# tail function

In [None]:
vk.tail()

In [None]:
subs.head()

In [None]:
subs.tail(10)

In [None]:
subs.sample()

In [None]:
movies.sample(5)

In [None]:
# value_counts 

movies.value_counts()

In [None]:
# sort_values (Ascending order)
vk.sort_values()

In [None]:
# decsending order
vk.sort_values(ascending=False)

In [None]:
# sort_index
movies

In [None]:
# it sort the indexes in ascending order. 
movies.sort_index().head()

In [None]:
# it sort the indexes in descending order. 
movies.sort_index(ascending=False)

### Series Maths Methods
1. **count** : count all the non-missing rows. 
2. **sum** : return the sum of the series. 
3. **mean/median/mode/std/var** : Return the single value accordingly. 
4. **min/max** : return minimum/maximum value from the given series. 
5. **describe** : describe the complete series in a tabular form. 

In [None]:
# count function
print(subs.count())

In [None]:
# sum function. 
print(subs.sum())

In [None]:
# mean function. 
print(vk.mean())

# median 
print(vk.median()) 

# mode 
print(movies.mode()) 

# standard deviation
print(vk.std()) 

# variance
print(vk.var())

In [None]:
# min/max 
print(subs.min())
print(subs.max())

In [None]:
# describe function. 
vk.describe()

### Series Indexing

In [None]:
# integer indexing
x = pd.Series([12,13,14,35,46,57,58,79,9])

In [None]:
# positive indexing
x[3]

`NOTE : ` Negative indexing is not working in case of pandas series.

In [None]:
x[-1]

In [None]:
movies[-5]

In [None]:
# Positive slicing in pandas series. 
vk[5 : 16]

In [None]:
# Negative slicing in pandas series. 
vk[-5 : ]

In [None]:
movies[ : : 2]

In [None]:
# fancy indexing 
vk[[1, 4, 5, 7, 8]]

In [None]:
# indexing with labels -> fancy indexing
movies['2 States (2014 film)']

### Editing Series.
- Pandas gives us way to edit a series, but it is not recommanded.

In [None]:
# using indexing
marks_series[1] = 100
marks_series

In [None]:
# what if an index does not exist
marks_series['evs'] = 100

In [None]:
marks_series

In [None]:
# slicing
runs_ser[2:4] = [100,100]
runs_ser

In [None]:
# fancy indexing
runs_ser[[0,3,4]] = [0,0,0]
runs_ser

In [None]:
# using index label
movies['2 States (2014 film)'] = 'Alia Bhatt'
movies

### Series with Python Functionalities

In [None]:
# len/type/dir/sorted/max/min
print(len(subs))
print(type(subs))
print(dir(subs))
print(sorted(subs))
print(min(subs))
print(max(subs))

In [None]:
# type conversion
list(marks_series)

In [None]:
dict(marks_series)

In [None]:
# membership operator
'2 States (2014 film)' in movies

In [None]:
'Alia Bhatt' in movies.values

In [None]:
movies

In [None]:
# looping
for i in movies:
    print(i)

In [None]:
# arithmetic operator. 
100 - marks_series

In [None]:
# Relation operator. 

# this give me boolean series, which has true at that place where the given condtion is true. 
# also called mask boolean series. 
vk >= 50

### Boolean Indexing on Series

In [None]:
# Find no of innings in which kohli score more than or equal to 50's. 

vk[vk >= 50].size

In [None]:
# find number of ducks. 

# mask boolean series. 
vk == 0

vk[vk == 0].size

In [None]:
# Count number of day when I had more than 200 subs a day. 

subs > 200

subs[subs > 200].size

In [None]:
# find actors who have done more than 20 movies. 

num_movies = movies.value_counts() 

# mask boolean series. 
num_movies > 20

num_movies[num_movies > 20]

### Plotting Graphs on Series

In [None]:
subs.plot()

In [None]:
movies.value_counts().head(20).plot(kind = 'bar')

In [None]:
movies.value_counts().head(10).plot(kind = 'pie')