Pandas
Pandas is a Python tool that helps you work with data easily — like how Excel works with tables.

It lets you:
	•	Read data from files (like CSV or Excel)
	•	Look at it in a table format
	•	Clean it if it’s messy
	•	Sort, filter, and analyze it
	•	Save the clean data again

Pandas is used a lot in data analysis, data science, and machine learning because most real-world data comes in table form — with rows and columns — and Pandas makes working with that kind of data super simple and powerful.



Pandas gives you 2 main tools:
	1.	Series → a single column (like a list with labels)
	2.	DataFrame → a full table (like an Excel sheet)
	•	It works on top of NumPy, meaning it uses NumPy under the hood to do fast calculations but adds labels (column names and row numbers) so it’s easier to understand and work with.
	•	It’s made for structured data (tables), which is the kind of data you usually get in the real world: CSVs, spreadsheets, databases, survey results, etc.

Why Pandas is Important
	•	It’s the starting point for any data science or machine learning project
	•	It helps you explore and clean data before you can build charts or train models
	•	It’s much easier to use than raw Python or NumPy when working with tabular data


🧮 NumPy (Numerical Python)

NumPy is a Python library that focuses on numerical operations and high-performance computations using arrays.
	•	It works with n-dimensional arrays (like lists of numbers, grids, or even 3D arrays).
	•	It’s mainly used for mathematical operations, linear algebra, matrix multiplications, and scientific calculations.
	•	It’s extremely fast because it uses optimized C code underneath.
	•	Data in NumPy doesn’t have column or row labels — just index positions (like array[0][1]).
	•	It’s perfect when you need to work with large sets of numbers or mathematical modeling, such as building machine learning algorithms from scratch.
	•	However, for real-world data like customer information or CSV files, NumPy alone is not very intuitive.

⸻

🐼 Pandas (Python Data Analysis Library)

Pandas is a library built on top of NumPy that makes working with structured/tabular data much easier. Behind the scenes numpy is the one that is working as a engine for most of the cases. Pandas can be considered as a dashboard for now.
	•	It introduces two main data structures: Series (1D labeled array) and DataFrame (2D labeled table).
	•	With Pandas, you can easily read data from files like CSVs, Excel sheets, and databases.
	•	It allows you to filter, sort, group, and clean data using human-friendly row and column labels.
	•	It’s designed to make real-world data handling simple and efficient — especially when data is messy or unstructured.
	•	While it’s not as fast at pure number crunching as NumPy, it makes data analysis workflows far more convenient.
	•	In most data science projects, Pandas is used first to clean and explore the data, and then NumPy may be used underneath for the calculations.


Pandas is built using NumPy (for data handling and speed), Cython (for performance), and Python (for structure), and it wraps them into a simple, powerful toolkit for working with structured data.

In [None]:
import pandas as pd 
import numpy as np
#You do not need to import NumPy just because Pandas is built on top of it.
#You only import NumPy when you plan to use NumPy directly in your own code.

Analyzing the Group of seven which is political formed by Canada, France, Germany,Italy, Japan , United Kingdom and United States . We will start analyzing the population and we will be initiating it with pandas.Series Object.

Data Structue refer to the way of storing a data in a computer so that it can be used effeciently.
Pandas has two types of data structures. 
1. pandas.Series
2. pandas.DataFrame

Think of data structures like containers:
	•	A Series is like a single column of labeled data (e.g., a list of countries with their populations).
	•	A DataFrame is like an Excel sheet — a table with rows and columns.

In [2]:
# Population in Millions
g7_population = pd.Series([35.467, 63.941, 80.940, 60.665, 127.061, 64.511, 318.523])

pd.Series() is a function providecd by the Pandas Library that stores the data creating a Series Object that is one dimensional array. Its like a dictionary or Excel column.

In [3]:
g7_population

0     35.467
1     63.941
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
dtype: float64

In [4]:
g7_population.name = "Group Seven Population in Millions"

In [None]:
g7_population
#We can add a name to the pandas series

0     35.467
1     63.941
2     80.940
3     60.665
4    127.061
5     64.511
6    318.523
Name: Group Seven Population in Millions, dtype: float64

Series are pretty similar to numpy arrays.

In [6]:
g7_population.dtype

dtype('float64')

In [None]:
g7_population.values
#This shows the underlying datas in a Series

array([ 35.467,  63.941,  80.94 ,  60.665, 127.061,  64.511, 318.523])

In [None]:
type(g7_population.values)
#By this we can conclude that it is indeed a numpy array. So pandas series is a numpy array

numpy.ndarray

In [9]:
#We can access its elements similar to extracting the elements in a python list
g7_population[0]

np.float64(35.467)

In [10]:
g7_population[1]

np.float64(63.941)

In [12]:
g7_population.index

RangeIndex(start=0, stop=7, step=1)

We can see that a pandas series and a list both have their respective index. Index are metioned for the pandas if we look at them but for the list they are not mentioned but its there. The main difference is that the index of the pandas series are changable. We can add up our own index to it.

These index are changable for a reason and that is what seperates a pandas library.Pandas allows changeable indexes so that data can be labeled, aligned, and analyzed in a smart and human-friendly way — which is essential for real-world data science and analytics.

In [None]:
g7_population.index = [
    "Canada",
    "France",
    "Germany",
    "Italy",
    "Japan",
    "United Kingdom",
    "United States",
]

In [None]:
g7_population
#This change of index helps us in reading the data in a more interactive way
#So from now on we wont be labeling them using a certain index number , instead we will be using their respective meaningful names so that we can easily play with the data

Canada             35.467
France             63.941
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
Name: Group Seven Population in Millions, dtype: float64

Now from the above data we can see that it looks similar to a dictionary that is a key value pair rather than a list with added features for data analysis. So, with that we can conclude that we can create series out of a dictionary.

Key differences between a dictionary and pandas series

Label-Based Access

A Python dict allows access to values using keys. For example, my_dict["Canada"] returns the value associated with the key “Canada”.

Similarly, a pandas.Series allows access to values using custom index labels. For example, my_series["Canada"] returns the corresponding value from the Series.

⸻

Order Preservation

In Python, dictionaries preserve insertion order only from version 3.7 onwards. In earlier versions, the order of keys was not guaranteed.

Pandas Series always preserves the order of elements. When you define values and labels in a specific order, that order is maintained throughout operations unless explicitly changed.

⸻

Vectorized Operations

Dictionaries do not support vectorized operations. You cannot perform mathematical operations on all values of a dictionary at once. You would need to loop through or use comprehensions.

Pandas Series support vectorized operations. You can perform operations like addition, subtraction, filtering, or applying mathematical functions directly on the entire Series without using loops.

⸻

Support for Missing Values (NaN)

Dictionaries have no built-in support for representing or handling missing values. If a key is absent, accessing it raises a KeyError.

Pandas Series can contain missing values using NaN (Not a Number), which makes it useful for real-world datasets that may have incomplete or missing entries.

⸻

Underlying Data Type

A Python dictionary is a built-in data type and is part of the core Python language. It does not rely on external libraries.

A Pandas Series is built on top of NumPy arrays. This gives it high performance and access to numerical operations that are not available in basic Python structures.


Converting dictionaries or JSON into Pandas Series is a common step in data ingestion and preparation, enabling the analyst to move from raw data to a structured format that’s easy to analyze and manipulate.

Ways to create a pandas series 

In [None]:
#1
g7_population = pd.Series({
    "Canada": 35.467,
    "France": 63.941,
    "Germany": 80.940,
    "Italy": 60.665,
    "Japan": 127.061,
    "United Kingdom": 64.511,
    "United States": 318.523
})

print(g7_population)

Canada             35.467
France             63.941
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
dtype: float64


In [17]:
#2
g7_population = pd.Series(
    [35.467, 63.941, 80.940, 60.665, 127.061, 64.511, 318.523],
    index=["Canada", "France", "Germany", "Italy", "Japan", "United Kingdom", "United States"]
)

print(g7_population)

Canada             35.467
France             63.941
Germany            80.940
Italy              60.665
Japan             127.061
United Kingdom     64.511
United States     318.523
dtype: float64


In [None]:
pd.Series(g7_population, index = ["France", "Germany", "Italy", "Spain"])
#This line is:
	# •	Creating a subset of the original Series,
	# •	Reordering the entries,
	# •	Filling NaN for any index label not present in the original (like "Spain" in this case).

France     63.941
Germany    80.940
Italy      60.665
Spain         NaN
dtype: float64

Above use of the given function	
    •	pd.Series(original_series, index=[...]) creates a new Series.
	•	It is not assigned unless you do so manually.
	•	It’s useful for selecting, reordering, or introducing missing entries in a structured way.