# Why use virtual environments in Jupyter Notebook:

### 1. **Keep projects separate:** Each notebook can use its own Python and package versions, so one project won’t break another.

**Example:** Project A needs pandas 1.5 and Project B needs pandas 2.0. Virtual environments let both coexist.

### 2. **Prevent code from breaking:** Even if you update Python or packages on your computer, your notebook will keep running smoothly.

### 3. **Easier sharing and deployment:** When you send your notebook to someone or deploy it, only the required packages are included, not all packages from your computer.

### 4. **Good practice for multiple projects:** Every new project should have its own environment to avoid confusion and messy dependencies.

In [2]:
print

<function print(*args, sep=' ', end='\n', file=None, flush=False)>

## What is Pandas

- Pandas is an open-source Python library used for data manipulation and analysis.

- It provides two main data structures — Series (1-dimensional) and DataFrame (2-dimensional) — that make it easy to clean, analyze, and handle large datasets efficiently.

- It is widely used in data science, machine learning, and data analytics because it helps perform operations like filtering, grouping, merging, and reshaping data with simple syntax.

https://pandas.pydata.org/about/index.html

## Panda series vs Dataframe:
**Series:**
- 1D labelled array. 
- A single colm
- Store data of similar type
- Eg. A single column in SQL table


**Dataframe:**
- 2D labelled array.
- Multiple Columns
- Different datatypes per columns.
- Eg. A full SQL table.

### Pandas Series

A Pandas Series is like a column in a table. It is a 1-D array holding data of any type.

### Importing Pandas

In [None]:
import numpy as np
import pandas as pd

### Series from lists

In [None]:
# string
country = ['India','Pakistan','USA','Nepal','Srilanka']

pd.Series(country)

> **Note:** Series Store Object(String) with their Index. Index can be generated automatically, or explicitly added. 

In [None]:
# integers
runs = [13,24,56,78,100]

runs_ser = pd.Series(runs)

### Custom Index

In [None]:
marks = [67,57,89,100]
subjects = ['maths','english','science','hindi']

pd.Series(marks,index=subjects)

### Setting a name, for the series

In [None]:
marks = pd.Series(marks,index=subjects,name='Shivam ke marks')
marks

In [None]:
import sys
print(sys.executable)


### Series from dict

In [None]:
marks = {
    'maths':67,
    'english':57,
    'science':89,
    'hindi':100
}

marks_series = pd.Series(marks,name='nitish ke marks')
marks_series

### Series Attributes

### 1. size

In [None]:
marks_series.size

### 2. dtype

In [None]:
marks_series.dtype

### 3. name

In [None]:
marks_series.name

### 4. is_unique

In [None]:
marks_series.is_unique

pd.Series([1,1,2,3,4,5]).is_unique

### 5. index

In [None]:
marks_series.index

In [None]:
runs_ser.index

### 6. values

In [None]:
marks_series.values

---

### Series using read_csv

In [None]:
# with one col
subs = pd.read_csv('../0Resources/DataSets/subs.csv')
subs

In [None]:
type(subs)

> **Note:** By default, the `read_csv` function, import the data in `DataFrame` format (Even if it is a 1D Column). To convert into a series, we use `.squeeze()` function.

### `squeeze()` function
Use to explicitly convert a DataFrame into a Series.

In [None]:
subs = subs.squeeze()
type(subs)

In [None]:
# with 2 cols
import pandas as pd
vk = pd.read_csv('../0Resources/DataSets/kohli_ipl.csv',index_col='match_no')
vk = vk.squeeze()
print(vk)
print(type(vk))

In [None]:
movies = pd.read_csv('../0Resources/DataSets/bollywood.csv',index_col='movie').squeeze()
movies

---

### Series methods

### 1. Head and tail
- Head returns, first 5 elements from above, where as Tail returns bottm 5 elements.
- We can also specify, how many no. of elements we want. 

In [None]:
subs.head()

In [None]:
vk.head(3)

In [None]:
vk.tail(10)

### 2. Sample
- Returns, 1 random sample
- We can pass any number, to generate any number of samples

> **Note:** When we need a random elements, we use `.sample()`, as fetching item from the top and bottom may include outliers. 

In [None]:
movies.sample()

In [None]:
movies.sample(3)

### 3. value_counts()
- Give the count of all values(elements)

In [None]:
movies.value_counts()

### 4. sort_values()
- Returns Values(by count) in Sorted order.

In [None]:
vk.sort_values()

In [None]:
# sort_values -> inplace
vk.sort_values(ascending=False).head(1).values[0]

In [None]:
vk.sort_values(ascending=False)

In [None]:
# sort_index -> inplace -> movies
movies.sort_index(ascending=False,inplace=True)

In [None]:
movies

In [None]:
vk.sort_values(inplace=True)

In [None]:
vk

### Series Maths Methods

In [None]:
# count
vk.count()

In [None]:
# sum -> product
subs.sum()

In [None]:
# mean -> median -> mode -> std -> var
subs.mean()
print(vk.median())
print(movies.mode())
print(subs.std())
print(vk.var())

In [None]:
# min/max
subs.max()

In [None]:
# describe
subs.describe()

### Series Indexing

In [None]:
# integer indexing
x = pd.Series([12,13,14,35,46,57,58,79,9])
x

In [None]:
# negative indexing
x[-1]

In [None]:
movies

In [None]:
vk[-1]

In [None]:
marks_series[-1]

In [None]:
# slicing
vk[5:16]

In [None]:
# negative slicing
vk[-5:]

In [None]:
movies[::2]

In [None]:
# fancy indexing
vk[[1,3,4,5]]

In [None]:
# indexing with labels -> fancy indexing
movies['2 States (2014 film)']

In [None]:
import numpy as np

### Editing Series

In [None]:
# using indexing
marks_series[1] = 100
marks_series

In [None]:
# what if an index does not exist
marks_series['evs'] = 100

In [None]:
marks_series

In [None]:
# slicing
runs_ser[2:4] = [100,100]
runs_ser

In [None]:
# fancy indexing
runs_ser[[0,3,4]] = [0,0,0]
runs_ser

In [None]:
# using index label
movies['2 States (2014 film)'] = 'Alia Bhatt'
movies

### Copy and Views

### Series with Python Functionalities

In [None]:
# len/type/dir/sorted/max/min
print(len(subs))
print(type(subs))
print(dir(subs))
print(sorted(subs))
print(min(subs))
print(max(subs))

In [None]:
# type conversion
list(marks_series)

In [None]:
dict(marks_series)

In [None]:
# membership operator

'2 States (2014 film)' in movies

In [None]:
'Alia Bhatt' in movies.values

In [None]:
movies

In [None]:
# looping
for i in movies.index:
  print(i)

In [None]:
# Arithmetic Operators(Broadcasting)
100 + marks_series

In [None]:
# Relational Operators

vk >= 50

### Boolean Indexing on Series

In [None]:
# Find no of 50's and 100's scored by kohli
vk[vk >= 50].size

In [None]:
# find number of ducks
vk[vk == 0].size

In [None]:
# Count number of day when I had more than 200 subs a day
subs[subs > 200].size

In [None]:
# find actors who have done more than 20 movies
num_movies = movies.value_counts()
num_movies[num_movies > 20]

### Plotting Graphs on Series

In [None]:
subs.plot()

In [None]:
movies.value_counts().head(20).plot(kind='pie')

### Some Important Series Methods

In [None]:
# astype
# between
# clip
# drop_duplicates
# isnull
# dropna
# fillna
# isin
# apply
# copy

In [None]:
import numpy as np
import pandas as pd

In [None]:
subs = pd.read_csv('/content/subs.csv',squeeze=True)
subs

In [None]:
vk = pd.read_csv('/content/kohli_ipl.csv',index_col='match_no',squeeze=True)
vk

In [None]:
movies = pd.read_csv('/content/bollywood.csv',index_col='movie',squeeze=True)
movies

In [None]:
# astype
import sys
sys.getsizeof(vk)

In [None]:
sys.getsizeof(vk.astype('int16'))

In [None]:
# between
vk[vk.between(51,99)].size

In [None]:
# clip
subs

In [None]:
subs.clip(100,200)

In [None]:
# drop_duplicates
temp = pd.Series([1,1,2,2,3,3,4,4])
temp

In [None]:
temp.drop_duplicates(keep='last')

In [None]:
temp.duplicated().sum()

In [None]:
vk.duplicated().sum()

In [None]:
movies.drop_duplicates()

In [None]:
temp = pd.Series([1,2,3,np.nan,5,6,np.nan,8,np.nan,10])
temp

In [None]:
temp.size

In [None]:
temp.count()

In [None]:
# isnull
temp.isnull().sum()

In [None]:
# dropna
temp.dropna()

In [None]:
# fillna
temp.fillna(temp.mean())

In [None]:
# isin
vk[(vk == 49) | (vk == 99)]

In [None]:
vk[vk.isin([49,99])]

In [None]:
# apply
movies

In [None]:
movies.apply(lambda x:x.split()[0].upper())

In [None]:
subs

In [None]:
subs.apply(lambda x:'good day' if x > subs.mean() else 'bad day')

In [None]:
subs.mean()

In [None]:
# copy

In [None]:
vk

In [None]:
new = vk.head()

In [None]:
new

In [None]:
new[1] = 1

In [None]:
new = vk.head().copy()

In [None]:
new[1] = 100

In [None]:
new

In [None]:
vk