<a href="https://colab.research.google.com/github/shrishti-04/DataAnalytics_Pandas_Numpy_Matplotlib_Seaborn/blob/master/Pandas_Series_Fundamentals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. DEFINE A PANDAS SERIES (WITH NUMERIC DEFAULT INDEX)

In [1]:
# Pandas is a data manipulation and analysis tool that is built on Numpy.
# Pandas uses a data structure known as DataFrame (think of it as Microsoft excel in Python). 
# DataFrames empower programmers to store and manipulate data in a tabular fashion (rows and columns).
# Series Vs. DataFrame? Series is considered a single column of a DataFrame.

In [2]:
import pandas as pd

In [3]:
# Let's define a Python list that contains 5 stocks: Nvidia, Microsoft, FaceBook, Amazon, and Boeing
my_list = ['NVDA', 'MSFT', 'FB', 'AMZN', 'BA']
my_list

['NVDA', 'MSFT', 'FB', 'AMZN', 'BA']

In [4]:
# Let's confirm the Datatype
type(my_list)

list

In [5]:
# Let's create a one dimensional Pandas "series" 
# Let's use Pandas Constructor Method to create a series from a Python list
# Note that series is formed of data and associated index (numeric index has been automatically generated) 
# Check Pandas Documentation for More information: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series
# Object datatype is used for text data (String)

series_1 = pd.Series(data = my_list)
series_1

0    NVDA
1    MSFT
2      FB
3    AMZN
4      BA
dtype: object

In [6]:
# Let's confirm the Pandas Series Datatype
type(series_1)

pandas.core.series.Series

In [7]:
# Let's define another Pandas Series that contains numeric values (stock prices) instead of text data
# Note that we have int64 datatype which means it's integer stored in 64 bits in memory
series_2 = pd.Series(data = [100, 200, 500, 1000, 2000])
series_2

0     100
1     200
2     500
3    1000
4    2000
dtype: int64

**MINI CHALLENGE #1:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite movies. Confirm the datatype of "my_series"**

In [8]:
movies_1 = ['Jab We Met', 'Titanic', 'Stranger Things']
movies_series = pd.Series(data = movies_1)
movies_series

0         Jab We Met
1            Titanic
2    Stranger Things
dtype: object

In [9]:
type(movies_series)

pandas.core.series.Series

# 2. DEFINE A PANDAS SERIES WITH CUSTOM INDEX

In [10]:
# Let's define a Python list that contains 5 stocks: Nvidia, Microsoft, FaceBook, Amazon, and Boeing
list_1 = ['NVDA', 'MSFT', 'FB', 'AMZN', 'BA']

In [11]:
# Let's define a python list as shown below. This python list will be used for the Series index:
lable_1 = ['stock#1', 'stock#2', 'stock#3', 'stock#4', 'stock#5']

In [12]:
# Let's create a one dimensional Pandas "series" 
# Let's use Pandas Constructor Method to create a series from a Python list
# Note that this series is formed of data and associated labels 
series_3 = pd.Series(data = list_1, index = lable_1)

In [13]:
# Let's view the series
series_3

stock#1    NVDA
stock#2    MSFT
stock#3      FB
stock#4    AMZN
stock#5      BA
dtype: object

In [14]:
# Let's obtain the datatype
type(series_3)

pandas.core.series.Series

**MINI CHALLENGE #2:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite movies. Instead of using default numeric indexes (similar to mini challenge #1), use the following indexes "movie #1", "Movie #2", and "movie #3"**

In [15]:
list_2 = ['Jab We Met', 'Titanic', 'Stranger Things']

In [16]:
lable_2 = ['movie #1', 'movie #2', 'movie #3']

In [17]:
series_4 = pd.Series(data = list_2, index = lable_2)
series_4

movie #1         Jab We Met
movie #2            Titanic
movie #3    Stranger Things
dtype: object

# 3. DEFINE A PANDAS SERIES FROM A DICTIONARY

In [18]:
# A Dictionary consists of a collection of key-value pairs. Each key-value pair maps the key to its corresponding value.
# Keys are unique within a dictionary while values may not be. 
# List elements are accessed by their position in the list, via indexing while Dictionary elements are accessed via keys
# Define a dictionary named "my_dict" using key-value pairs

dict_1 = {'Bank client id': 201,
          'Bank client name': 'Shrishti',
          'Net worth (INR)': 45000,
          'Years with bank': 6,}

In [19]:
# Show the dictionary
dict_1

{'Bank client id': 201,
 'Bank client name': 'Shrishti',
 'Net worth (INR)': 45000,
 'Years with bank': 6}

In [20]:
# Confirm the dictionary datatype 
type(dict_1)

dict

In [21]:
# Let's define a Pandas Series Using the dictionary
series_5 = pd.Series(dict_1)
series_5

Bank client id           201
Bank client name    Shrishti
Net worth (INR)        45000
Years with bank            6
dtype: object

**MINI CHALLENGE #3:**
- **Create a Pandas Series from a dictionary with 3 of your favourite stocks and their corresponding prices** 

In [22]:
my_dict = {'Netflix': 990,
           'AMZN': 1500,
           'MSFT': 1200}

print(my_dict)

{'Netflix': 990, 'AMZN': 1500, 'MSFT': 1200}


In [23]:
series_6 = pd.Series(my_dict)
series_6

Netflix     990
AMZN       1500
MSFT       1200
dtype: int64

# 4. PANDAS ATTRIBUTES

In [24]:
# Attributes/Properties: do not use parantheses "()" and are used to get Pandas Series Properties. Ex: my_series.values, my_series.shape
# Methods: use parantheses "()" and might include arguments and they actually alter/change the Pandas Series. Ex: my_series.tail(), my_series.head(), my_series.drop_duplicates()
# Indexers: use square brackets "[]" and are used to access specific elements in a Pandas Series or DataFrame. Ex: my_series.loc[], my_series.iloc[]

# Let's redefine a Pandas Series containing our favourite 5 stocks 
# Nvidia, Microsoft, FaceBook, Amazon, and Boeing

list_3 = ['NVDA', 'MSFT', 'FB', 'AMZN', 'BA']
series_7 = pd.Series(data = list_3)
series_7

0    NVDA
1    MSFT
2      FB
3    AMZN
4      BA
dtype: object

In [25]:
# ".Values" attribute is used to return Series as ndarray depending on its dtype
# Check this for more information: https://pandas.pydata.org/docs/reference/api/pandas.Series.values.html#pandas.Series.values
series_7.values

array(['NVDA', 'MSFT', 'FB', 'AMZN', 'BA'], dtype=object)

In [26]:
# index is used to return the index (axis labels) of the Series
series_7.index

RangeIndex(start=0, stop=5, step=1)

In [27]:
# dtype is used to return the datatype of the Series ('O' stands for 'object' datatype)
series_7.dtype

dtype('O')

In [28]:
# Check if all elements are unique or not
series_7.is_unique

True

In [29]:
# Check the shape of the Series
# note that a Series is one dimensional
series_7.shape

(5,)

**MINI CHALLENGE #4:** 
- **What is the size of the Pandas Series? (External Research for the proper attribute is Required)**

In [30]:
series_7.size

5

# 5. PANDAS METHODS

In [31]:
# Methods have parentheses and they actually alter/change the Pandas Series
# Methods: use parantheses "()" and might include arguments. Ex: my_series.tail(), my_series.head(), my_series.drop_duplicates()

# Let's define another Pandas Series that contains numeric values (stock prices) instead of text data
# Note that we have int64 datatype which means it contains integer values stored in 64 bits in memory

series_8 = pd.Series(data = [100, 200, 500, 1000, 5000])
series_8 

0     100
1     200
2     500
3    1000
4    5000
dtype: int64

In [32]:
# Let's obtain the sum of all elements in the Pandas Series
series_8.sum()

6800

In [33]:
# Let's obtain the multiplication of all elements in the Pandas Series
series_8.product()

50000000000000

In [34]:
# Let's obtain the average
series_8.mean()

1360.0

In [35]:
# Let's show the first couple of elements in the Pandas Series
series_8.head(2)

0    100
1    200
dtype: int64

In [36]:
# Note that head creates a new dataframe 
new_series = series_8.head(3)
new_series

0    100
1    200
2    500
dtype: int64

**MINI CHALLENGE #5:** 
- **Show the last 2 rows in the Pandas Series (External Research is Required)** 
- **How many bytes does this Pandas Series consume in memory? (External Research is Required)**

In [37]:
series_8.tail(2)

3    1000
4    5000
dtype: int64

# 6. IMPORT CSV DATA (1-D) USING PANDAS

In [38]:
# Pandas read_csv is used to read a csv file and store data in a DataFrame by default (DataFrames will be covered shortly!)
# Use Squeeze to convert it into a Pandas Series (One-dimensional)
# Notice that no foramtting exists when a Series is plotted

sp500 = pd.read_csv('/kaggle/input/sp500prices/S_P500_Prices.csv', squeeze=True)
sp500

0       1295.500000
1       1289.089966
2       1293.670044
3       1308.040039
4       1314.500000
           ...     
2154    3327.770020
2155    3349.159912
2156    3351.280029
2157    3360.469971
2158    3333.689941
Name: sp500, Length: 2159, dtype: float64

**MINI CHALLENGE #6:**
- **Set Squeeze = False and rerun the cell, what do you notice? Use Type to compare both outputs**

In [39]:
sp500_2 = pd.read_csv('/kaggle/input/sp500prices/S_P500_Prices.csv', squeeze=False)
sp500_2

Unnamed: 0,sp500
0,1295.500000
1,1289.089966
2,1293.670044
3,1308.040039
4,1314.500000
...,...
2154,3327.770020
2155,3349.159912
2156,3351.280029
2157,3360.469971


In [40]:
type(sp500)

pandas.core.series.Series

In [41]:
type(sp500_2)

pandas.core.frame.DataFrame

# 7. PANDAS BUILT-IN FUNCTIONS

In [42]:
# Pandas works great with pre-existing python functions 
# You don't have to play with pandas methods and directly leverage Python functions
# Check Python built-in functions here: https://docs.python.org/3/library/functions.html

sp500 = pd.read_csv('/kaggle/input/sp500prices/S_P500_Prices.csv', squeeze=True)
sp500

0       1295.500000
1       1289.089966
2       1293.670044
3       1308.040039
4       1314.500000
           ...     
2154    3327.770020
2155    3349.159912
2156    3351.280029
2157    3360.469971
2158    3333.689941
Name: sp500, Length: 2159, dtype: float64

In [43]:
# Obtain the Data Type of the Pandas Series
type(sp500)

pandas.core.series.Series

In [44]:
# Obtain the length of the Pandas Series
len(sp500)

2159

In [45]:
# Obtain the maximum value of the Pandas Series
max(sp500)

3386.149902

In [46]:
# Obtain the minimum value of the Pandas Series
min(sp500)

1278.040039

**MINI CHALLENGE #7:**
- **Given the following Pandas Series, convert all positive values to negative using python built-in functions**
- **Obtain only unique values (ie: Remove duplicates) using python built-in functions**
- **my_series = pd.Series(data = [-10, 100, -30, 50, 100])**


In [47]:
-abs(sp500)

0      -1295.500000
1      -1289.089966
2      -1293.670044
3      -1308.040039
4      -1314.500000
           ...     
2154   -3327.770020
2155   -3349.159912
2156   -3351.280029
2157   -3360.469971
2158   -3333.689941
Name: sp500, Length: 2159, dtype: float64

In [48]:
sp500.is_unique

False

In [49]:
sp500 = sp500.drop_duplicates()
sp500

0       1295.500000
1       1289.089966
2       1293.670044
3       1308.040039
4       1314.500000
           ...     
2154    3327.770020
2155    3349.159912
2156    3351.280029
2157    3360.469971
2158    3333.689941
Name: sp500, Length: 2147, dtype: float64

In [50]:
sp500.is_unique

True

The real soln

In [51]:
my_series = pd.Series(data = [-10, 100, -30, 50, 100])
my_series

0    -10
1    100
2    -30
3     50
4    100
dtype: int64

In [52]:
abs(my_series)

0     10
1    100
2     30
3     50
4    100
dtype: int64

In [53]:
set(my_series)

{-30, -10, 50, 100}

# 8. SORTING PANDAS SERIES

In [54]:
# Let's import CSV data as follows:
sp500_3 = pd.read_csv('/kaggle/input/sp500prices/S_P500_Prices.csv', squeeze=True)
sp500_3

0       1295.500000
1       1289.089966
2       1293.670044
3       1308.040039
4       1314.500000
           ...     
2154    3327.770020
2155    3349.159912
2156    3351.280029
2157    3360.469971
2158    3333.689941
Name: sp500, Length: 2159, dtype: float64

In [55]:
# You can sort the values in the dataframe as follows
sp500_3.sort_values()

97      1278.040039
98      1278.180054
99      1285.500000
1       1289.089966
2       1293.670044
           ...     
2038    3373.229980
2034    3373.939941
2033    3379.449951
2035    3380.159912
2037    3386.149902
Name: sp500, Length: 2159, dtype: float64

In [56]:
# Let's view Pandas Series again after sorting, Note that nothing changed in memory! you have to make sure that inplace is set to True
sp500_3

0       1295.500000
1       1289.089966
2       1293.670044
3       1308.040039
4       1314.500000
           ...     
2154    3327.770020
2155    3349.159912
2156    3351.280029
2157    3360.469971
2158    3333.689941
Name: sp500, Length: 2159, dtype: float64

In [57]:
# Set inplace = True to ensure that change has taken place in memory 
sp500_3.sort_values(inplace=True)
sp500_3

97      1278.040039
98      1278.180054
99      1285.500000
1       1289.089966
2       1293.670044
           ...     
2038    3373.229980
2034    3373.939941
2033    3379.449951
2035    3380.159912
2037    3386.149902
Name: sp500, Length: 2159, dtype: float64

In [58]:
# Note that now the change (ordering) took place 
sp500_3

97      1278.040039
98      1278.180054
99      1285.500000
1       1289.089966
2       1293.670044
           ...     
2038    3373.229980
2034    3373.939941
2033    3379.449951
2035    3380.159912
2037    3386.149902
Name: sp500, Length: 2159, dtype: float64

In [59]:
# Notice that the indexes are now changed 
# You can also sort by index (revert back to the original Pandas Series) as follows: 
sp500_3.sort_index(inplace=True)
sp500_3

0       1295.500000
1       1289.089966
2       1293.670044
3       1308.040039
4       1314.500000
           ...     
2154    3327.770020
2155    3349.159912
2156    3351.280029
2157    3360.469971
2158    3333.689941
Name: sp500, Length: 2159, dtype: float64

**MINI CHALLENGE #8:**
- **Sort the S&P500 values in a decending order instead. Make sure to update values in-memory.**

In [60]:
sp500_3.sort_values(ascending=False, inplace=True)
sp500_3

2037    3386.149902
2035    3380.159912
2033    3379.449951
2034    3373.939941
2038    3373.229980
           ...     
2       1293.670044
1       1289.089966
99      1285.500000
98      1278.180054
97      1278.040039
Name: sp500, Length: 2159, dtype: float64

# 9. PERFORM MATH OPERATIONS ON PANDAS SERIES

In [61]:
# Let's import CSV data as follows:
sp500_4 = pd.read_csv('/kaggle/input/sp500prices/S_P500_Prices.csv', squeeze=True)
sp500_4

0       1295.500000
1       1289.089966
2       1293.670044
3       1308.040039
4       1314.500000
           ...     
2154    3327.770020
2155    3349.159912
2156    3351.280029
2157    3360.469971
2158    3333.689941
Name: sp500, Length: 2159, dtype: float64

In [62]:
# Apply Sum Method on Pandas Series
sp500_4.sum()

4790280.287214

In [63]:
# Apply count Method on Pandas Series
sp500_4.count()

2159

In [64]:
# Obtain the maximum value
sp500_4.max()

3386.149902

In [65]:
# Obtain the minimum value
sp500_4.min()

1278.040039

In [66]:
# My favourite: Describe! 
# Describe is used to obtain all statistical information in one place 

sp500_4.describe()

count    2159.000000
mean     2218.749554
std       537.321727
min      1278.040039
25%      1847.984985
50%      2106.629883
75%      2705.810059
max      3386.149902
Name: sp500, dtype: float64

**MINI CHALLENGE #9:**
- **Obtain the average price of the S&P500 using two different methods**

In [67]:
sp500_4.mean()

2218.7495540592868

# 10. CHECK IF A GIVEN ELEMENT EXISTS IN A PANDAS SERIES

In [68]:
# Let's import CSV data as follows:
sp500_5 = pd.read_csv('/kaggle/input/sp500prices/S_P500_Prices.csv', squeeze=True)
sp500_5

0       1295.500000
1       1289.089966
2       1293.670044
3       1308.040039
4       1314.500000
           ...     
2154    3327.770020
2155    3349.159912
2156    3351.280029
2157    3360.469971
2158    3333.689941
Name: sp500, Length: 2159, dtype: float64

In [69]:
# Check if a given number exists in a Pandas Series values
# Returns a boolean "True" or "False"

1295.500000 in sp500_5.values

True

In [70]:
# Check if a given number exists in a Pandas Series index
32.5 in sp500_5.index

False

In [71]:
# Note that by default 'in' will search in Pandas index and not values
45 in sp500_5

True

**MINI CHALLENGE #10:**
- **Check if the stock price 3349 exists in the sp500 Pandas Series or not**
- **Round stock prices to the nearest integer and check again**

In [72]:
3349 in sp500_5.values

False

In [73]:
sp500_5 = round(sp500_5)
sp500_5

0       1296.0
1       1289.0
2       1294.0
3       1308.0
4       1314.0
         ...  
2154    3328.0
2155    3349.0
2156    3351.0
2157    3360.0
2158    3334.0
Name: sp500, Length: 2159, dtype: float64

In [74]:
3349 in sp500_5.values

True

# 11. INDEXING: OBTAIN SPECIFIC ELEMENTS FROM PANDAS SERIES

In [75]:
# Let's import CSV data as follows:

sp500_6 = pd.read_csv('/kaggle/input/sp500prices/S_P500_Prices.csv', squeeze=True)
sp500_6

0       1295.500000
1       1289.089966
2       1293.670044
3       1308.040039
4       1314.500000
           ...     
2154    3327.770020
2155    3349.159912
2156    3351.280029
2157    3360.469971
2158    3333.689941
Name: sp500, Length: 2159, dtype: float64

In [76]:
# Obtain the first element in a Pandas Series
# Note that first element has an index 0

sp500_6[0]

1295.5

In [77]:
sp500_6[200]

1411.939941

In [78]:
# Obtain the last element in the Pandas Series
sp500_6[2158]

3333.689941

**MINI CHALLENGE #11:**
- **Obtain the fifth element in the Pandas Series**

In [79]:
sp500_6[4]

1314.5

# 12. SLICING: OBTAIN MULTIPLE ELEMENTS FROM PANDAS SERIES

In [80]:
# Let's import CSV data as follows:
sp500 = pd.read_csv('/kaggle/input/sp500prices/S_P500_Prices.csv', squeeze=True)
sp500

0       1295.500000
1       1289.089966
2       1293.670044
3       1308.040039
4       1314.500000
           ...     
2154    3327.770020
2155    3349.159912
2156    3351.280029
2157    3360.469971
2158    3333.689941
Name: sp500, Length: 2159, dtype: float64

In [81]:
# Slice elements from a Pandas Series
# Let's obtain elements starting from index 0 up until and not including index 5 (ie: indexes 0-4)

sp500[0:5]

0    1295.500000
1    1289.089966
2    1293.670044
3    1308.040039
4    1314.500000
Name: sp500, dtype: float64

In [82]:
# obtain all elements starting from index 0 up until and not including index 10
sp500[:10]

0    1295.500000
1    1289.089966
2    1293.670044
3    1308.040039
4    1314.500000
5    1315.380005
6    1316.000000
7    1314.650024
8    1326.060059
9    1318.430054
Name: sp500, dtype: float64

In [83]:
# obtain all elements starting from index 5 up until the end of the Pandas Series
sp500[5:]

5       1315.380005
6       1316.000000
7       1314.650024
8       1326.060059
9       1318.430054
           ...     
2154    3327.770020
2155    3349.159912
2156    3351.280029
2157    3360.469971
2158    3333.689941
Name: sp500, Length: 2154, dtype: float64

**MINI CHALLENGE #12:**
- **Obtain all elements in Pandas Series except for the last 3 elements**

In [84]:
sp500[:-3]

0       1295.500000
1       1289.089966
2       1293.670044
3       1308.040039
4       1314.500000
           ...     
2151    3271.120117
2152    3294.610107
2153    3306.510010
2154    3327.770020
2155    3349.159912
Name: sp500, Length: 2156, dtype: float64