
# Khipus.ai  
## Fundamentals of Data Science  
### Python (Pandas)  
<span>© Copyright Notice 2025, Khipus.ai - All Rights Reserved.</span>


## Pandas Package
Pandas provides fast, easy-to-understand data structures:

- Built on top of NumPy.

- Used for many data science projects around the world.

- Most people will import pandas using “pd” as an alias.

Main data structure in pandas:

Series
- An indexed list
- Looks like NumPy array.

DataFrame
- Each column in a dataframe can have different datatypes
- Looks like “Table” .

In [1]:
import pandas as pd
import numpy as np

In [4]:
# Create a simple dataset of people
data = {'Name': ["John", "Anna", "Peter"],
        'Location' : ["New York", "Paris", "Berlin"],
        'Age' : [24, 13, 53]
       }

df = pd.DataFrame(data)

print(df)

    Name  Location  Age
0   John  New York   24
1   Anna     Paris   13
2  Peter    Berlin   53


## Series
A Series is a one-dimensional labeled array capable of holding data of any type.

You can create a Series from a list.

In [5]:
# From a list
s = pd.Series([10, 20, 30])
print(s)

0    10
1    20
2    30
dtype: int64


## Create Series
You can create a Series from a NumPy array, or dictionary.

In [7]:
# From a NumPy array
s2 = pd.Series(np.array([5, 6, 7]))
print(s2)

# From a dictionary
data = {'x': 10, 'y': 20}
s3 = pd.Series(data)
print(s3)

0    5
1    6
2    7
dtype: int32
x    10
y    20
dtype: int64


## Access to Values and Index
You can access values and indexes separately.

In [20]:
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print('Values:', s.values)
print('Index:', s.index)

Values: [10 20 30]
Index: Index(['a', 'b', 'c'], dtype='object')


## Indexing by String
Set and use string labels as an index.

In [21]:
s = pd.Series([10, 20, 30], index=['apple', 'banana', 'cherry'])
print(s)

apple     10
banana    20
cherry    30
dtype: int64


## Access Elements Using Index
You can extract elements by applying indexing to the Series.

In [22]:
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
print(s['b':'c'])

b    20
c    30
dtype: int64


## DataFrame
A DataFrame is a two-dimensional labeled data structure.

In [13]:
df = pd.DataFrame({'Name': ['Alice', 'Bob','Mary'], 'Age': [25, 30, 35]})
print(df)

    Name  Age
0  Alice   25
1    Bob   30
2   Mary   35


## Create DataFrame from Series
You can combine multiple Series to create a DataFrame.

In [16]:
sales = pd.Series([20000, 35000], index=['Q1', 'Q2'])
expenses = pd.Series([15000, 20000], index=['Q1', 'Q2'])

financials = pd.DataFrame({'Sales': sales, 'Expenses': expenses})
print(financials)

    Sales  Expenses
Q1  20000     15000
Q2  35000     20000


## Create DataFrame from List
You can create a DataFrame from a list of dictionaries or other iterable structures.

Note: A list of dictionaries is a data structure in Python where each element in the list is a dictionary. Each dictionary can have different keys and values. This structure is useful for storing a collection of related data items.

In [14]:
data = [{'Name': 'Alice', 'Age': 25}, {'Name': 'Bob', 'Age': 30}] # List of dictionaries
df = pd.DataFrame(data)
print(df)

    Name  Age
0  Alice   25
1    Bob   30


# Indexing DataFrame
Indexing in pandas allows you to set or reset the index of a DataFrame to optimize data access and analysis.

# Access to DataFrame Elements
You can access elements in a DataFrame by columns or rows.


In [19]:
data = {'Product': ['Laptop', 'Phone', 'Tablet'], 'Price': [1000, 800, 600]}
df = pd.DataFrame(data)
df.index = pd.Index(['A', 'B', 'C'])  # Setting a custom index
print(df['Price'])


A    1000
B     800
C     600
Name: Price, dtype: int64


# Add Column
You can add a new column to the DataFrame dynamically.



In [24]:
# Add a column for discount price (20% of the original price)
df['Discounted Price'] = df['Price'] * 0.2
print(df)


  Product  Price  Discounted Price
A  Laptop   1000             200.0
B   Phone    800             160.0
C  Tablet    600             120.0


# Delete Column
Remove a column from the DataFrame.


In [25]:
# Delete the 'Discounted Price' column
df.drop('Discounted Price', axis=1, inplace=True)
print(df)

  Product  Price
A  Laptop   1000
B   Phone    800
C  Tablet    600


# Filtering DataFrame
Apply conditions to filter rows in the DataFrame.


In [26]:
# Filter products with price greater than 700
filtered_df = df[df['Price'] > 700]
print(filtered_df)

  Product  Price
A  Laptop   1000
B   Phone    800


# Universal Function at pandas
Use NumPy universal functions on pandas objects.

In [27]:
# Multiply all prices by 2
New_prices = np.multiply(df['Price'], 2)
print(New_prices)

A    2000
B    1600
C    1200
Name: Price, dtype: int64
