A data structure is a collection of data values and operations that can be applied to that data.

 It enables efficient storage, retrieval and modification to the data.

  Pandas offers two primary data structures: Series and DataFrame.

A Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, float, string, etc.). It is similar to a column in a table or an Excel sheet.

In [31]:
import pandas as pd
print("\n-------------Creating a series from scalar value-----------------\n")
s_scalar = pd.Series(10,index = ['a','b','c'])
print(s_scalar)
print("\n-------------Creating a series from scalar value only one-----------------\n")
s_scalar = pd.Series(10)
print(s_scalar)
print("\n-------------Creating a series from list-----------------\n")
data = [10,20,30,40,50,60]
s = pd.Series(data)
print(s)
print("\n-------------We can specify a custom index to a list-----------------\n")
s = pd.Series(data,index = ['a','b','c','d','e','f'])
print(s)
print("\n-------------Creating a series from dictionary-----------------\n")
dict = {'a':10,'b':20,'c':30,'d':40,'e':50,'f':60}
print('Dict is: ',dict)
s = pd.Series(dict)
print('Series is: \n',s)
print('Indexing and Slicing(Similar to Numpy)')
print('Data: ',data,',',dict)
print('Indexed data' , data[1],',',dict['c'],',',s['a':'d'])
print('\n------------------Index,Value,Name,size,shape,hasnans,is_unique,empty----------------------------\n')
data = ['New Delhi', 'Moscow', 'Kabul', 'Beijing']
index = ['India', 'Russia', 'Afghanistan', 'China']
ser = pd.Series(data,index)
ser.name = 'Capitals' #Setting the name
print(ser.index)
print(ser.values)
print(ser.name)
print(ser.dtype)
print(ser.shape)
print(ser.size)
print(ser.empty)
print(ser.hasnans)
print(ser.is_unique)
print(ser.ndim)
print(ser.nbytes)
print('\n------------------Methods in Python----------------------------\n')
print('head, tail, describe, value_counts - gives unique values')
print(ser.head(2))
print(ser.tail(2))
print(ser.describe())
print(ser.value_counts())
print(ser.sort_values())
print(ser.sort_index())
print(ser.drop('Afghanistan'))
ser.replace('Kabul','Kaabul',inplace = True)
print(ser)
print('\n------------------Handling null value-----------------\n')
ser = pd.Series([None,None,1,2,3,None,4 ])
print('fillna')
ser2 = ser.fillna(0)
print(ser2)
print('dropna')
ser2 = ser.dropna()
print(ser2)


-------------Creating a series from scalar value-----------------

a    10
b    10
c    10
dtype: int64

-------------Creating a series from scalar value only one-----------------

0    10
dtype: int64

-------------Creating a series from list-----------------

0    10
1    20
2    30
3    40
4    50
5    60
dtype: int64

-------------We can specify a custom index to a list-----------------

a    10
b    20
c    30
d    40
e    50
f    60
dtype: int64

-------------Creating a series from dictionary-----------------

Dict is:  {'a': 10, 'b': 20, 'c': 30, 'd': 40, 'e': 50, 'f': 60}
Series is: 
 a    10
b    20
c    30
d    40
e    50
f    60
dtype: int64
Indexing and Slicing(Similar to Numpy)
Data:  [10, 20, 30, 40, 50, 60] , {'a': 10, 'b': 20, 'c': 30, 'd': 40, 'e': 50, 'f': 60}
Indexed data 20 , 30 , a    10
b    20
c    30
d    40
dtype: int64

------------------Index,Value,Name,size,shape,hasnans,is_unique,empty----------------------------

Index(['India', 'Russia', 'Afghanistan', '

A DataFrame in Python is a two-dimensional, tabular data structure provided by the Pandas library. It is one of the most powerful and flexible tools for data manipulation and analysis. A DataFrame is similar to a spreadsheet or SQL table, with rows and columns, and it can handle data from various sources like CSV files, databases, or even raw Python data structures.

In [41]:
import pandas as pd
import numpy as np
print('\n--------------Creating an empty dataframe-------------------\n')
df = pd.DataFrame()
print(df)
print('\n--------------Creating a dataframe from numpy array-------------------\n')
data = np.array([[25, 'New York'],[30, 'Los Angeles'],[35, 'Chicago'],[40, 'Houston']])
df = pd.DataFrame(data,columns=['age','city'])
print(df)
print('\n--------------Creating a dataframe from dictionary-------------------\n')
data = [{'Name': 'Alice', 'Age': 25, 'City': 'New York'},{'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'},{'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'},{'Name': 'David', 'Age': 40, 'City': 'Houston'}]
df = pd.DataFrame(data,columns = ['Name','Age','City'])
print(df)
print('\n--------------Creating a dataframe from excel-------------------\n')
try:
  customers = pd.read_excel('customers.xlsx')
  products = pd.read_excel('products.xlsx')
  purchases = pd.read_excel('purchases.xlsx')
except FileNotFoundError as e:
  print(e)
print('Type of customer dataset',type(customers))
print('\n--------------Accessing DataFrames Element through Indexing/Slicing-------------------\n')
column_data = customers['city']
print(column_data)
print('Using loc accessor')
first_column = customers.loc[:,'id'] # ':' means select all rows of column labeled 'id'
print(first_column)
print('Using iloc accessor - It uses positional indexing')
second_column = customers.iloc[:,1]
print(second_column)
print('Selecting multiple columns')
df = products[['id','product','cost']]

df1 = products.loc[:,['id','product','cost']]

df2 = products.iloc[:,[0,2]]

print('\n--------------Access a single row----------------\n')
df = products.loc[2]
print(df)
df1 = products.iloc[3] #iloc uses position
print(df1)

print('\n--------------Boolean Indexing----------------\n')
df = products[products['id'] % 20 ==0]
print(df)
df1 = customers[customers['gender'] == 'Male']
df2 = customers[(customers['gender'] == 'Male') & (customers['id']  % 6 ==0)]
print(df2)

print('\n--------------Slicing both row and column----------------\n')
df = purchases.loc[2:4, ['product_num', 'paid']]
print(df)
df1 = purchases.iloc[2:5,1:4]
print(df1)


--------------Creating an empty dataframe-------------------

Empty DataFrame
Columns: []
Index: []

--------------Creating a dataframe from numpy array-------------------

  age         city
0  25     New York
1  30  Los Angeles
2  35      Chicago
3  40      Houston

--------------Creating a dataframe from dictionary-------------------

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   40      Houston

--------------Creating a dataframe from excel-------------------

Type of customer dataset <class 'pandas.core.frame.DataFrame'>

--------------Accessing DataFrames Element through Indexing/Slicing-------------------

0         San Diego
1           El Paso
2       San Antonio
3           Memphis
4               NaN
           ...     
995         Reading
996    Indianapolis
997       Waterbury
998        Savannah
999         Atlanta
Name: city, Length: 1000, dtype: object
Using loc accessor
0         1
1   