<a href="https://colab.research.google.com/github/sonu0801singh/Git/blob/main/Pandas_Series_DS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What is Pandas

Pandas is open source data analysis and manipulation tool,
built on top of the Python programming language.

https://pandas.pydata.org/about/index.html

## Series

In pandas, a Series is a one-dimensional labeled array capable of holding data of any type (integer, float, string, Python objects, etc.). It is similar to a one-dimensional NumPy array but with additional features like index labels.

## Importing Pandas and numpy

In [None]:
import pandas as pd
import numpy as np

### Series from the list

In [None]:
#String
country =['India', 'Pakistan', 'Japan', 'Bangladesh', 'USA', 'Austrilia', 'England', 'China']
country_series= pd.Series(country)
country_series

0         India
1      Pakistan
2         Japan
3    Bangladesh
4           USA
5     Austrilia
6       England
7         China
dtype: object

In [None]:
#integer
runs= [40, 50, 60, 200, 100, 30]
pd.Series(runs)

0     40
1     50
2     60
3    200
4    100
5     30
dtype: int64

In [None]:
#Custom index
marks = [57, 86, 90, 100, 67]
subjects= ['Hindi', 'English', 'maths', 'Science', 'Social science']
pd.Series(marks, index=subjects)

Hindi              57
English            86
maths              90
Science           100
Social science     67
dtype: int64

In [None]:
salary = ['10k', '20k', '30k', '40k', '50k', '100k', '200k', '300k']
Exprence= [2,3,5,6,7,8,9,10]

pd.Series(salary, index=Exprence)

2      10k
3      20k
5      30k
6      40k
7      50k
8     100k
9     200k
10    300k
dtype: object

In [None]:
# setting a name: WE can set a name to a Series
pd.Series(salary, index=Exprence, name="Salaries")


2      10k
3      20k
5      30k
6      40k
7      50k
8     100k
9     200k
10    300k
Name: Salaries, dtype: object

### Series From dict

In [None]:
marks_dict = {
    'Hindi': 30,
    'English': 50,
    'Maths': 100,
    'Bio': 49,
    'Science': 87
}

marks_dict

{'Hindi': 30, 'English': 50, 'Maths': 100, 'Bio': 49, 'Science': 87}

In [None]:
marks_series= pd.Series(marks_dict, name='shivays_marks')
marks_series

Hindi       30
English     50
Maths      100
Bio         49
Science     87
Name: shivays_marks, dtype: int64

### Series Attributes

In [None]:
country_series

0         India
1      Pakistan
2         Japan
3    Bangladesh
4           USA
5     Austrilia
6       England
7         China
dtype: object

In [None]:
marks_series

Hindi       30
English     50
Maths      100
Bio         49
Science     87
Name: shivays_marks, dtype: int64

In [None]:
#size
print(country_series.size)
print(marks_series.size)



8
5


In [None]:
#dtype
print(country_series.dtype)
print(marks_series.dtype)

object
int64


In [None]:
#name
print(country_series.name)
print(marks_series.name)


None
shivays_marks


In [None]:
# is_unique
print(country_series.is_unique)
print(marks_series.is_unique)

True
True


In [None]:
unique_list_test = [1,1,1,2,2,3,3,4,5,6,7,8,9]
test_is_unique= pd.Series(unique_list_test)
print(test_is_unique.is_unique)


False


In [None]:
#index
print(marks_series.index)


Index(['Hindi', 'English', 'Maths', 'Bio', 'Science'], dtype='object')


In [None]:
#index
print(country_series.index)

RangeIndex(start=0, stop=8, step=1)


In [None]:
# values
print(country_series.values)
print(marks_series.values)


['India' 'Pakistan' 'Japan' 'Bangladesh' 'USA' 'Austrilia' 'England'
 'China']
[ 30  50 100  49  87]


### Series using read_csv

In [None]:
subscribe_data= pd.read_csv('/content/sample_data/pandas/subs.csv')
subscribe_data


FileNotFoundError: [Errno 2] No such file or directory: '/content/sample_data/pandas/subs.csv'

In [None]:
#By default when you import any data from the source pandas convert it into dataframe.
#To change it to the Series you have to provide squeeze=True
type(subscribe_data)

In [5]:
subscribe_data= pd.read_csv('/content/subs.csv').squeeze("columns")
subscribe_data

0       48
1       57
2       40
3       43
4       44
      ... 
360    231
361    226
362    155
363    144
364    172
Name: Subscribers gained, Length: 365, dtype: int64

In [6]:
# with 2 cols
kohli_ipl= pd.read_csv('/content/kohli_ipl.csv',index_col='match_no' ).squeeze("columns")
kohli_ipl

match_no
1       1
2      23
3      13
4      12
5       1
       ..
211     0
212    20
213    73
214    25
215     7
Name: runs, Length: 215, dtype: int64

In [7]:
movie= pd.read_csv('/content/bollywood.csv', index_col="movie").squeeze("columns")
movie


movie
Uri: The Surgical Strike                   Vicky Kaushal
Battalion 609                                Vicky Ahuja
The Accidental Prime Minister (film)         Anupam Kher
Why Cheat India                            Emraan Hashmi
Evening Shadows                         Mona Ambegaonkar
                                              ...       
Hum Tumhare Hain Sanam                    Shah Rukh Khan
Aankhen (2002 film)                     Amitabh Bachchan
Saathiya (film)                             Vivek Oberoi
Company (film)                                Ajay Devgn
Awara Paagal Deewana                        Akshay Kumar
Name: lead, Length: 1500, dtype: object

### Series methods

In [8]:
#head
#head(10) will provide you the value with 10 top numbers
subscribe_data.head()


0    48
1    57
2    40
3    43
4    44
Name: Subscribers gained, dtype: int64

In [9]:
#tail
#tail(10) will give you 10 number from the last
subscribe_data.tail()


360    231
361    226
362    155
363    144
364    172
Name: Subscribers gained, dtype: int64

In [12]:
#sample
subscribe_data.sample(5)

80      88
73      80
135    109
215    119
3       43
Name: Subscribers gained, dtype: int64

In Pandas, the sample() method is used to randomly select a specified number of rows or columns from a DataFrame. This method is particularly useful when you want to take a random sample from your data for analysis or testing purposes.

In [15]:
# value_counts -> movies

movie.value_counts()


Akshay Kumar        48
Amitabh Bachchan    45
Ajay Devgn          38
Salman Khan         31
Sanjay Dutt         26
                    ..
Diganth              1
Parveen Kaur         1
Seema Azmi           1
Akanksha Puri        1
Edwin Fernandes      1
Name: lead, Length: 566, dtype: int64

In Pandas, the value_counts() method is used to count the occurrences of unique values in a Series. This method is particularly useful for understanding the distribution of values in a column of a DataFrame.

In [17]:
# sort_values
kohli_ipl.sort_values()


match_no
87       0
211      0
207      0
206      0
91       0
      ... 
164    100
120    100
123    108
126    109
128    113
Name: runs, Length: 215, dtype: int64

In [18]:
kohli_ipl.sort_values(ascending=False)


match_no
128    113
126    109
123    108
164    100
120    100
      ... 
93       0
211      0
130      0
8        0
135      0
Name: runs, Length: 215, dtype: int64

Mean: Represents the average value for each column.

Median: Represents the middle value for each column.

Mode: Represents the most frequent value for each column.

Standard Deviation: Represents the measure of the spread of values around the mean for each column.

Variance: Represents the average of the squared differences from the mean for each column.

### Series Maths Methods

In [None]:
#series
# count
vk.count()