Pandas is used in data science, machine learning, finance, analytics and automation because it integrates smoothly with other libraries such as:

NumPy: numerical operations
Matplotlib and Seaborn: data visualization
SciPy: statistical analysis
Scikit-learn: machine learning workflows
With Pandas, you can load data, clean it, transform it, analyze it, visualize it all in just a few lines of code.

Data Structures in Pandas
1. Pandas Series
A Pandas Series is one-dimensional labeled array capable of holding data of any type (integer, string, float, Python objects etc.). The axis labels are collectively called indexes. Series is created by loading the datasets from existing storage which can be a SQL database, a CSV file or an Excel file.

In [12]:
import pandas as pd
import numpy as np
s = pd.Series()
print("pandas series:", s)
d = np. array (['g','e','e','k','s'])
print("numpy array:",d)
s = pd.Series(d)
print("pandas series:",s)

pandas series: Series([], dtype: object)
numpy array: ['g' 'e' 'e' 'k' 's']
pandas series: 0    g
1    e
2    e
3    k
4    s
dtype: object


2. Pandas DataFrame
Pandas DataFrame is a two-dimensional data structure with labeled axes (rows and columns). It is created by loading the datasets from existing storage which can be a SQL database, a CSV file or an Excel file. It can be created from lists, dictionaries, a list of dictionaries etc.

In [17]:
import pandas as pd
df = pd.DataFrame()
print(df)
list = ["geeks","for","geeks"]
df = pd.DataFrame(list)
print(df)

Empty DataFrame
Columns: []
Index: []
       0
0  geeks
1    for
2  geeks


Operations in Pandas

1. Loading Data: This operation reads data from files such as CSV, Excel or JSON into a DataFrame.

In [23]:
import pandas as pd

df = pd.read_csv(r"C:\Users\VICTUS\OneDrive\Desktop\parent\Machine learning and data science\datasets\data_for_pandas_day16.csv")
print(df.head())

   Duration          Date  Pulse  Maxpulse  Calories
0        60  '2020/12/01'    110       130     409.1
1        60  '2020/12/02'    117       145     479.0
2        60  '2020/12/03'    103       135     340.0
3        45  '2020/12/04'    109       175     282.4
4        45  '2020/12/05'    117       148     406.0


2. Viewing and Exploring Data: After loading data, it is important to understand its structure and content. This methods allow you to inspect rows, summary statistics and metadata.

In [26]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Duration  32 non-null     int64  
 1   Date      31 non-null     object 
 2   Pulse     32 non-null     int64  
 3   Maxpulse  32 non-null     int64  
 4   Calories  30 non-null     float64
dtypes: float64(1), int64(3), object(1)
memory usage: 1.4+ KB
None


3. Handling Missing Data: Datasets often contain empty or missing values. Pandas provides functions to detect, remove or replace these values.


In [29]:
print(df.isnull().sum())

Duration    0
Date        1
Pulse       0
Maxpulse    0
Calories    2
dtype: int64


In [33]:
df = df.fillna(0)

4. Selecting and Filtering Data: This operation retrieves specific columns, rows or records that match a condition. It allows precise extraction of required information.

In [36]:
pulse = df[df['Pulse'] < 100]
print(pulse)

    Duration          Date  Pulse  Maxpulse  Calories
9         60  '2020/12/10'     98       124     269.0
15        60  '2020/12/15'     98       123     275.0
16        60  '2020/12/16'     98       120     215.2
18        45  '2020/12/18'     90       112       0.0
20        45  '2020/12/20'     97       125     243.0
27        60  '2020/12/27'     92       118     241.0
31        60  '2020/12/31'     92       115     243.0


5. Adding and Removing Columns: You can create new columns based on existing ones or delete unwanted columns from the DataFrame.

In [39]:
df['total Pulse'] = df['Pulse'] + df['Maxpulse']
print(df.head())

   Duration          Date  Pulse  Maxpulse  Calories  total Pulse
0        60  '2020/12/01'    110       130     409.1          240
1        60  '2020/12/02'    117       145     479.0          262
2        60  '2020/12/03'    103       135     340.0          238
3        45  '2020/12/04'    109       175     282.4          284
4        45  '2020/12/05'    117       148     406.0          265


6. Grouping Data (GroupBy): Grouping allows you to organize data into categories and compute values for each group for example, sums, counts or averages.

In [42]:
import pandas as pd

data = {'Category': ['Electronics', 'Clothing', 'Electronics', 'Clothing'],
        'Sales': [1000, 500, 800, 300]}

df = pd.DataFrame(data)

# Group by 'Category' and sum 'Sales'
grouped = df.groupby('Category')['Sales'].sum()

print(grouped)

Category
Clothing        800
Electronics    1800
Name: Sales, dtype: int64
