# Python For Machine Learning


Python is widely used in machine learning because it offers powerful libraries and tools for data handling, model training, and evaluation. It makes building ML models easier and faster.

#### Key Python Basics to Learn

1. **Data Structures** – Lists, dictionaries, tuples, sets for organizing data.
2. **Control Flow** – `if/else`, `for`, `while` loops to add logic to code.
3. **Functions** – For reusable, modular code.
4. **OOP Basics** – Understanding classes and objects (many ML tools use OOP).
5. **Libraries** – How to import and use libraries like `NumPy` and `pandas`.
6. **File I/O** – Reading/writing data files like CSV.
7. **Error Handling** – Using `try...except` to manage runtime errors.

#### Essential Libraries

- **NumPy** – For numerical operations with arrays and matrices.
- **Pandas** – For data cleaning, analysis, and working with DataFrames.


### Python Syntax
Syntax refers to the set of rules that define how Python code is written and interpreted.


In [1]:
# print even number within <=6
numbers = [1, 2, 3,4, 5, 6]

for number in numbers:
    if number % 2 == 0:
        print(f"{number} is even")


2 is even
4 is even
6 is even


In [2]:
### take input from user

name = input("What is your name?")
age = input("What is your age?")
print(f"Hi, My name is {name}. I am {age} years old.")

Hi, My name is Nilima . I am 21 years old.


### Python Keywords

Keywords are reserved words in Python that have special meaning and cannot be used as variable names. Examples include `if, for, class, def, return,` etc.

In [3]:
if True:
    print("This is a keyword example.")

This is a keyword example.


### Variables
Variables are used to store data. Python is dynamically typed, so we don’t need to declare the variable type.
Naming the variable is always important.
Do you know why ?

In [4]:
first_name = "Generative"
last_name = "AI"
year= 2021
full_name = first_name + " " + last_name + str(year)
print(full_name)

Generative AI2021


In [5]:
qty = 12
price = 12.55

print(f"Total price is {qty*price} ")
print(f"Total price is {(qty*price):.2f} ")

Total price is 150.60000000000002 
Total price is 150.60 


#### TASK1: Create variables to store your name, age, and list of skills.

In [20]:
name=input('Enter your name:')
age=int(input('Enter your age:'))
skills=[]
count=0
for i in range(int(input('Enter the number of skills you have:'))):
    skills.append(input(f'Skill {i+1}:'))
    count+=1

print(f'My name is {name} and I am {age} years old. My skills are:')
for i in range(count):
    print(f'Skill {i+1}:{skills[i]}')


My name is Nilima Shrestha and I am 21 years old. My skills are:
Skill 1:Python programming language
Skill 2:HTML/CSS
Skill 3:JavaScript
Skill 4:C++
Skill 5:MS Excel


### Data Types

* **int**: Integer type for whole numbers (e.g., `10`).
* **float**: Floating-point type for decimal numbers (e.g., `3.14`).
* **str**: String type for sequences of characters (e.g., `"Hello"`).
* **bool**: Boolean type representing `True` or `False`.
* **NoneType**: Represents the absence of a value (`None`).


In [21]:
# Integer
number = 100
print(type(number))

# Float
pi_value = 3.14159
print(type(pi_value))

# String
greeting = "Good Morning!"
print(type(greeting))

# Boolean
is_active = False
print(type(is_active))

# NoneType
value = None
print(type(value))

# Data structure with data types

# List
my_list = [1, "apple", 3.14, True]
print(type(my_list))

# Tuple
my_tuple = (3, "mango", 1.64, True)
print(type(my_tuple))

# Set
my_set = {1,3,5,7,9}
print(type(my_set))

# dict
my_dictionary = {"name":"Ram","age":"24"}
print(type(my_dictionary))


<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>
<class 'NoneType'>
<class 'list'>
<class 'tuple'>
<class 'set'>
<class 'dict'>


### Operators
An operator is a symbol that performs a specific operation on one or more values or variables.

In [7]:
# Arithmetic Operator (Addition)
sum_result = 5 + 3

# Comparison Operator (Greater Than | Less Than)
is_greater = 10 > 5

# Logical Operator (AND)
both_true = True and False

# Assignment Operator (Multiplication and Assignment)
x = 5
x *= 2

# Identity Operator (is)
list1 = [1, 2]
list2 = [1, 2]
is_same_object = list1 is list2


print("Sum result:", sum_result)
print("Is 10 greater than 5? :", is_greater)
print("Both True AND False:", both_true)
print("Value of x after *= 2:", x)
print("list1 is list2:", is_same_object)


Sum result: 8
Is 10 greater than 5? : True
Both True AND False: False
Value of x after *= 2: 10
list1 is list2: False


#### TASK2: Create a program that checks if a student has passed a test and gives feedback.

In [23]:
std_marks=float(input("Enter marks of the student:"))
print('Pass!' if std_marks>=50 else 'Fail!')
print('Excellent, Good Job' if std_marks>=90 else 'Good' if std_marks>=80 else 'Nice' if std_marks>=70 else 'Satisfactory! try more' if std_marks>=50 else 'Not good')


Pass!
Satisfactory! try more


In [8]:
your_marks = int(input("Enter your marks"))
if  your_marks > 40:
    print("You have passed the test")
else:
    print("You have failed the test")

You have failed the test


In [9]:
marks = [23,43,54,40,39,41]

pass_or_fail = [each for each in marks if each>=40 ]
pass_or_fail


[43, 54, 40, 41]

### Data Structures (Lists, Tuples, Sets, Dictionaries)

* **list**: Ordered, mutable collection (e.g., `[1, 2, 3]`).
* **tuple**: Ordered, immutable collection (e.g., `(1, 2, 3)`).
* **dict**: Unordered collection of key-value pairs (e.g., `{"a": 1}`).
* **set**: Unordered collection of unique items (e.g., `{1, 2, 3}`).

In [24]:
# List (mutable, ordered)
my_shopping_list = ["milk", "bread", "eggs"]
my_shopping_list.append("cheese")
print(my_shopping_list[0])

# Tuple (immutable, ordered)
coordinates = (10.0, 20.0)
# coordinates[0] = 12.0 # This would raise an error
print(coordinates[1])

# Set (mutable, unordered, unique elements)
my_set = {1, 2, 2, 3, 4, 4}
print(my_set)
my_set.add(5)
print(my_set)

# Dictionary (mutable, key-value pairs)
person = {"name": "Alice", "age": 30, "city": "New York"}
print(person["name"])
person["age"] = 31 # changing the vlaue
print(person)


milk
20.0
{1, 2, 3, 4}
{1, 2, 3, 4, 5}
Alice
{'name': 'Alice', 'age': 31, 'city': 'New York'}


In [25]:
# Nested data structure (List of dictionaries)
students = [
    {"name": "Bob", "id": "s001"},
    {"name": "Charlie", "id": "s002"}
]
print(students[0]["name"])

Bob


### Control Flow (if, for, while)
Control flow in Python refers to the order in which individual statements, instructions, or function calls are executed or evaluated. It includes decision-making (`if`, `elif`, `else`), loops (`for`, `while`), and control statements (`break`, `continue`, `pass`) that determine how the program proceeds.


In [26]:
# If statement
temperature = 25
if temperature > 30:
    print("It's hot!")
elif temperature > 20:
    print("It's warm.")
else:
    print("It's cool.")

# For loop (iterating over a list)
fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
    print(fruit)

# While loop
count = 0
while count < 5:
    print(count)
    count += 1

# For loop with range
for i in range(3):
    print(i)

# Break in a loop
numbers = [1, 2, 3, 4, 5]
for num in numbers:
    if num == 3:
        break
    print(num)



It's warm.
apple
banana
cherry
0
1
2
3
4
0
1
2
1
2


#### TASK3 :Write a program to:
- Take input: Scores of 10 students or just create a your own list.
- Add 5 bonus points to every score below 50.
- Print each score with a label: "Pass" if 50 or above, else "Fail".
- Finally, print the updated list of scores.

In [31]:
scores=[]

for i in range(10):
    scores.append(int(input(f'Score {i+1}:')))
print(scores)
updated_scores=[score+5 if score <50 else score for score in scores]
updated_scores


[79, 45, 34, 76, 34, 23, 87, 34, 43, 23]


[79, 50, 39, 76, 39, 28, 87, 39, 48, 28]

In [13]:
score = []
for i in range(3):
    score.append(int(input(f"Enter score for student {i+1}: ")))
updated_score = [each+5 if each <40 else each for each in score]

print(updated_score)

## need more update

[90, 78, 45]


## Functions
Functions are reusable blocks of code that perform specific tasks. They help organize code and reduce repetition. Functions can accept inputs (called parameters) and may return outputs, and they are a fundamental concept in almost all programming languages.
- Defining and calling functions
- Arguments and return values
- Lambda functions

### Defining and calling functions


In [32]:
# Simple function with no arguments or return value
def greet():
  print("Hello!")

greet()

# Function with one argument
def say_hello(name):
  print(f"Hello, {name}!")

say_hello("Alice")

# Function with a default argument
def greeting(name="Guest"):
  print(f"Welcome, {name}!")

greeting()
greeting("Bob") # Overrides the default argument

# Function with multiple arguments
def add_numbers(x, y):
  print(f"The sum of {x} and {y} is {x + y}")

add_numbers(5, 10)

# Calling a function inside another function
def process_data(data):
  print("Processing data...")
  # processs
  print(f"Data: {data}")

def analyze_data(dataset):
  print("Analyzing data...")
  # analysis
  process_data(dataset)

analyze_data([1, 2, 3, 4, 5])


Hello!
Hello, Alice!
Welcome, Guest!
Welcome, Bob!
The sum of 5 and 10 is 15
Analyzing data...
Processing data...
Data: [1, 2, 3, 4, 5]


### Arguments and return values


In [33]:
# Function that returns a value
def multiply(x, y):
  result = x * y
  return result

product = multiply(4, 6)
print(f"The product is: {product}")

# Function that returns multiple values (as a tuple)
def get_min_max(numbers):
  minimum = min(numbers)
  maximum = max(numbers)
  return minimum, maximum

data = [10, 5, 20, 15]
min_val, max_val = get_min_max(data)
print(f"Min value: {min_val}, Max value: {max_val}")

# Function with no explicit return (implicitly returns None)
def print_message(message):
  print(message)
  # No return statement

return_value = print_message("This function returns None")
print(f"The return value is: {return_value}")



The product is: 24
Min value: 5, Max value: 20
This function returns None
The return value is: None


In [37]:
# Function returning a boolean
def is_even(number):
  return True if number%2==0 else False

print(f"Is 7 even? {is_even(7)}")
print(f"Is 10 even? {is_even(10)}")

Is 7 even? False
Is 10 even? True


In [38]:

# Using a returned value in a calculation
def calculate_area(radius):
  return 3.14*radius*radius

circle_area = calculate_area(5)
print(f"The area of the circle is: {circle_area:.2f}")



The area of the circle is: 78.50


### Lambda functions
A lambda function is a small, anonymous function defined using the lambda keyword. It can have any number of arguments but only one expression, which is returned automatically.

In [39]:
# Simple lambda function
add = lambda x, y: x + y
print(f"Lambda addition: {add(3, 5)}")

# Lambda function used with filter()
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even_numbers = list(filter(lambda n: n % 2 == 0, numbers))
print(f"Even numbers: {even_numbers}")

# Lambda function used with map()
numbers = [1, 2, 3, 4, 5]
squared_numbers = list(map(lambda x: x**2, numbers))
print(f"Squared numbers: {squared_numbers}")

# Lambda function used for sorting
students = [('Alice', 30), ('Bob', 25), ('Charlie', 35)]
# Sort by age (the second element of the tuple)
sorted_students_by_age = sorted(students, key=lambda student: student[1])
print(f"Sorted students by age: {sorted_students_by_age}")

# Lambda function with multiple arguments in a function call
def apply_operation(x, y, operation):
  return operation(x, y)

result = apply_operation(10, 20, lambda a, b: a - b)
print(f"Lambda operation result: {result}")

Lambda addition: 8
Even numbers: [2, 4, 6, 8, 10]
Squared numbers: [1, 4, 9, 16, 25]
Sorted students by age: [('Bob', 25), ('Alice', 30), ('Charlie', 35)]
Lambda operation result: -10


## Object-Oriented Programming (OOP)
- Classes and Objects
- Methods and Constructors
- Inheritance

In [40]:
class Animal:
    def __init__(self, name):
        self.name = name
    def speak(self):
        return f"{self.name} makes a sound"

dog = Animal("Dog")
print(dog.speak())

Dog makes a sound


### Classes and Objects
**Classes** are blueprints for creating objects, defining the structure and behavior (methods and attributes) they will have. They encapsulate data and functions into a single unit.

**Objects** are instances of classes that hold actual data and can perform actions using class-defined methods. Each object can have different values for its attributes.


In [41]:
# Basic Class and Object Creation
class Dog:
    species = "Canis familiaris"
    def __init__(self, name, age):
        self.name = name
        self.age = age
        print(f"A new dog named {self.name} is born!") # Constructor action

    # Instance method
    def bark(self):
        print(f"{self.name} says Woof!")

    # Destructor
    def __del__(self):
        print(f"{self.name} is leaving us.")

my_dog = Dog("Buddy", 3)
your_dog = Dog("Lucy", 5)


print(f"{my_dog.name} is {my_dog.age} years old.")
print(f"{your_dog.name} is {your_dog.age} years old.")

my_dog.bark()
your_dog.bark()

print(f"All dogs are of species: {Dog.species}")
print(f"My dog is of species: {my_dog.species}")

# The __del__ method will be called automatically when the object is garbage collected or when its reference count drops to zero.
del your_dog


A new dog named Buddy is born!
A new dog named Lucy is born!
Buddy is 3 years old.
Lucy is 5 years old.
Buddy says Woof!
Lucy says Woof!
All dogs are of species: Canis familiaris
My dog is of species: Canis familiaris
Lucy is leaving us.


### Methods and Constructors
**Methods** are functions defined inside a class that operate on objects of that class. They are used to define the behavior or actions an object can perform.  
**Constructors** are special methods that are automatically called when a new object is created.

In [42]:
# __init__ method
# The `__init__` method in a class is a special constructor function that automatically runs when a new object is created. It initializes the object’s attributes with the values provided during instantiation.


In [43]:
# Class with another method and using 'self'
class Car:
    def __init__(self, make, model, year):
        self.make = make
        self.model = model
        self.year = year
        self.is_engine_started = False

    def start_engine(self):
        if not self.is_engine_started:
            self.is_engine_started = True
            print(f"The {self.year} {self.make} {self.model}'s engine started.")
        else:
            print("The engine is already running.")

    def display_info(self):
        print(f"Car: {self.year} {self.make} {self.model}")

my_car = Car("Toyota", "Camry", 2020)

my_car.display_info()
my_car.start_engine()
my_car.start_engine()




Car: 2020 Toyota Camry
The 2020 Toyota Camry's engine started.
The engine is already running.


### Inheritance
Inheritance is an object-oriented programming concept where a class (called a child or subclass) inherits properties and behaviors (methods and attributes) from another class (called a parent or superclass).

In [44]:
class Animal:
    def __init__(self, name):
        self.name = name
        print(f"An animal named {self.name} is created.")

    def speak(self):
        pass

    def __del__(self):
        print(f"The animal {self.name} is no longer with us.")


class Cat(Animal):
    def __init__(self, name, breed):
        super().__init__(name)  # Call the parent class constructor
        self.breed = breed
        print(f"A cat named {self.name} of breed {self.breed} is created.")


    # Override the speak method
    def speak(self):
        print(f"{self.name} says Meow!")

    def __del__(self):
        print(f"The cat {self.name} is leaving us.")
        super().__del__()


my_cat = Cat("Whiskers", "Siamese")


my_cat.speak()
print(f"My cat's name is {my_cat.name} and breed is {my_cat.breed}.")

# The destructors will be called when the objects are deleted or go out of scope
# del my_cat




An animal named Whiskers is created.
A cat named Whiskers of breed Siamese is created.
Whiskers says Meow!
My cat's name is Whiskers and breed is Siamese.


In [45]:
import math

class Circle:
    def __init__(self, radius):
        self.radius = radius

    def get_area(self):
        return math.pi * self.radius**2

    def get_circumference(self):
        return 2 * math.pi * self.radius

circle1 = Circle(5)
circle2 = Circle(10)

print(f"Area of circle1: {circle1.get_area():.2f}")
print(f"Circumference of circle2: {circle2.get_circumference():.2f}")

Area of circle1: 78.54
Circumference of circle2: 62.83


TASK4:
Complete the methods to add a book, display available books, and borrow a book from the library.

In [49]:
class Books:
    def __init__(self):
        self.book_list=[]
    def add_book(self,b):
        self.book_list.append(b)
        print(f'Book is added!')
    def display_book(self):
        for i in self.book_list:
            print(i)
    def borrow_book(self,b):
        if b in self.book_list:
            print('The book is available')
            self.book_list.remove(b)
            print('You have successfully borrowed the book')
        else:
            print('Book is unavailable')

book=Books()
book.add_book('The power of your subconscious mind')
book.add_book('It ends with us')
book.add_book('Atomic habits')

book.display_book()
book.borrow_book('Atomic habits')
book.display_book()

book.borrow_book('Atomic habits')

Book is added!
Book is added!
Book is added!
The power of your subconscious mind
It ends with us
Atomic habits
The book is available
You have successfully borrowed the book
The power of your subconscious mind
It ends with us
Book is unavailable


In [47]:
class Library:
    def __init__(self):
        self.books = []

    def add_book(self, book):
        """
        Add a new book to the library.
        """
        self.books.append(book)
        print(f"{book} added.")

    def display_books(self):
        """
        Display all available books in the library.
        """
      
        print("Available books:")
        for book in self.books:
            print(book)
        return self.books

    def borrow_book(self, book):
        """
        Borrow a book if it's available.
        """
   
        if book in self.books:
          self.books.remove(book)
          return "borrowed"
        if book in self.books:
          return "available"
        else:
          return "unavailable"



lib = Library()

# Add books
lib.add_book("1984")
lib.add_book("To Kill a Mockingbird")

# Display books
print(lib.display_books())

# Borrow a book
print(lib.borrow_book("1984"))

# Display again to see the update
lib.display_books()


1984 added.
To Kill a Mockingbird added.
Available books:
1984
To Kill a Mockingbird
['1984', 'To Kill a Mockingbird']
borrowed
Available books:
To Kill a Mockingbird


['To Kill a Mockingbird']

## Review of Python Packages for data ( Numpy and Pandas

### NumPy
NumPy is a Python library used for fast numerical computations, especially with arrays and matrices.
We will see..**bold text**
- Arrays
- Array Operations
- Indexing and Slicing

**Why** are we using numpy

- Fast and efficient operations on large arrays and matrices

- Built-in functions for mathematical, statistical, and linear algebra tasks


In [50]:
import numpy as np
import time

# Using list
list_data = list(range(1_000_000))
start = time.time()
list_squared = [x**2 for x in list_data]
print("Python list time:", time.time() - start)

# Using NumPy
array_data = np.arange(1_000_000)
start = time.time()
array_squared = array_data ** 2
print("NumPy array time:", time.time() - start)


Python list time: 0.3316216468811035
NumPy array time: 0.010204792022705078


In [51]:
# Vector operation
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print("Sum:", a + b)
print("Element-wise product:", a * b)


Sum: [5 7 9]
Element-wise product: [ 4 10 18]


In [52]:
# Matrix
matrix = np.array([[1, 2, 3], [4, 5, 6]])
scalar = 10
print("Add scalar to matrix:\n", matrix + scalar)


Add scalar to matrix:
 [[11 12 13]
 [14 15 16]]


In [53]:
# Boolean
arr = np.array([10, 20, 30, 40, 50])
print("Elements > 25:", arr[arr > 25])


Elements > 25: [30 40 50]


In [54]:
# Statical operation
data = np.array([1, 2, 3, 4, 5])
print("Mean:", np.mean(data))
print("Standard Deviation:", np.std(data))
print("Sum:", np.sum(data))


Mean: 3.0
Standard Deviation: 1.4142135623730951
Sum: 15


In [55]:
# multi-dimensional array
a = np.array([[1, 2], [3, 4], [5, 6]])
print("Shape:", a.shape)
print("Access element [1, 1]:", a[1, 1])


Shape: (3, 2)
Access element [1, 1]: 4


In [56]:
# linear algebra
A = np.array([[1, 2], [3, 4]])
B = np.array([[2, 0], [1, 2]])
print("Matrix multiplication:\n", A @ B)
print("Inverse of A:\n", np.linalg.inv(A))


Matrix multiplication:
 [[ 4  4]
 [10  8]]
Inverse of A:
 [[-2.   1. ]
 [ 1.5 -0.5]]


In [57]:
# handle missing values
data = np.array([1, 2, np.nan, 4])
print("Mean (ignoring NaNs):", np.nanmean(data))


Mean (ignoring NaNs): 2.3333333333333335


TASK5:
Create a NumPy array of 100 random integers between 1 and 1000.
- Find the mean, median, and standard deviation of the array.
- Replace all values below the mean with the mean value itself.
- Print the updated array.

In [58]:
import numpy as np

# Find the mean, median, and standard deviation of the array.
arr = np.random.randint(1, 1000,100)
# Find the mean, median, and standard deviation of the array.
print("Mean:", np.mean(arr))
print("Median:", np.median(arr))
print("Standard Deviation:", np.std(arr))

# Replace all values below the mean with the mean value itself.
arr[arr < np.mean(arr)] = np.mean(arr)

print(len(arr))
print(arr)

Mean: 456.73
Median: 467.5
Standard Deviation: 283.0263187408549
100
[694 545 715 456 698 582 456 456 542 519 779 882 947 456 456 456 484 456
 456 583 456 776 456 456 468 739 456 456 456 783 892 456 616 456 629 456
 456 566 456 456 456 456 456 456 456 557 472 456 740 766 456 456 704 456
 824 456 581 588 456 456 456 470 456 456 853 456 860 956 815 760 568 456
 456 753 684 830 508 814 877 855 456 456 467 636 646 456 518 698 456 585
 456 668 998 962 456 456 456 456 456 456]


## Pandas
Pandas is a Python library for data manipulation and analysis, providing easy-to-use data structures like DataFrames.  
We will see..
- Series and DataFrames
- Reading and Writing Data
- Data Selection and Filtering

#### **We** use it because:
- It simplifies handling and cleaning of structured data (like CSVs, tables).
- It offers powerful tools for data analysis and transformation with intuitive syntax.

In [59]:
# Importing pandas
import pandas as pd

# Supress warnings
import warnings
warnings.filterwarnings('ignore')

In [60]:
# 1d Series
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
series

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [70]:
# Dataframe 2d like table
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Ram', None, 'Eva'],
    'Age': [25, 30, None, 34, 22, 28],
    'City': ['New York', 'San Francisco', 'Los Angeles', 'Kathmandu', 'Chicago', 'Houston'],
    'Occupation': ['Engineer', 'Artist', 'Doctor', 'Teacher', 'Developer', 'Designer'],
    'Marital Status': ['Single', 'Married', 'Single', 'Married', 'Single', 'Single']
}

df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City,Occupation,Marital Status
0,Alice,25.0,New York,Engineer,Single
1,Bob,30.0,San Francisco,Artist,Married
2,Charlie,,Los Angeles,Doctor,Single
3,Ram,34.0,Kathmandu,Teacher,Married
4,,22.0,Chicago,Developer,Single
5,Eva,28.0,Houston,Designer,Single


### Reading and Writing Data
#### Reading form different file format
```python
df = pd.read_csv('data.csv')
df = pd.read_json('data.json')
df = pd.read_csv('data.txt', delimiter='\t')
```
#### Writing data to different file format
```python
df.to_csv('output.csv', index=False)
df.to_json('output.json', orient='records', lines=True) #in place of records we can also use (split, index, columns, values)
df.to_csv('output.txt', sep='\t', index=False)
```

In [67]:
### Load any of your data here...
datas = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
    "Subject": ["Math", "Science", "English", "History", "Math"],
    "Marks": [85, 42, 78, 33, 91]
}
df2 = pd.DataFrame(datas)

### Data Inspection

In [68]:
df2.head()

Unnamed: 0,Name,Subject,Marks
0,Alice,Math,85
1,Bob,Science,42
2,Charlie,English,78
3,David,History,33
4,Eva,Math,91


In [69]:
df2.tail()

Unnamed: 0,Name,Subject,Marks
0,Alice,Math,85
1,Bob,Science,42
2,Charlie,English,78
3,David,History,33
4,Eva,Math,91


#### Getting basic information about the DataFrame

In [65]:
df.info()
df.columns
df.dtypes
df.describe()
df.T
sorted_column_df = df.sort_index(axis=1, ascending=True)
sorted_rows_df = df.sort_values(by='Name', ascending=True)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Name     5 non-null      object
 1   Subject  5 non-null      object
 2   Marks    5 non-null      int64 
dtypes: int64(1), object(2)
memory usage: 252.0+ bytes


In [71]:
ages = df['Age']
subset = df[['Name', 'City']]
df2 = df[0:3]
selected_row = df[df['City'] == 'New York']
df.loc[:, ["Name", "Age"]]


Unnamed: 0,Name,Age
0,Alice,25.0
1,Bob,30.0
2,Charlie,
3,Ram,34.0
4,,22.0
5,Eva,28.0


In [72]:
# concatination is method for combining dataframes:

df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['A', 'B', 'D'], 'value2': [4, 5, 6]})


concatenated_df2 = pd.concat([df1,df2])

print(concatenated_df2)

  key  value1  value2
0   A     1.0     NaN
1   B     2.0     NaN
2   C     3.0     NaN
0   A     NaN     4.0
1   B     NaN     5.0
2   D     NaN     6.0


In [73]:
# Joining DataFrames:
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['A', 'B', 'D'], 'value2': [4, 5, 6]})

# default of merge is inner
concatenated_df = pd.merge(df1, df2)
print(concatenated_df)

concatenated_df = pd.merge(df1, df2, on='key')

  key  value1  value2
0   A       1       4
1   B       2       5


In [74]:
# Sample DataFrame with NaN values
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, np.nan, 35, 40, np.nan],
    'Score': [85, 90, np.nan, 88, 92]
}

df3 = pd.DataFrame(data)
df3

Unnamed: 0,Name,Age,Score
0,Alice,25.0,85.0
1,Bob,,90.0
2,Charlie,35.0,
3,David,40.0,88.0
4,Eva,,92.0


In [75]:
# Fill NaN values with the mean of their respective columns using lambda
df3[['Age', 'Score']] = df3[['Age', 'Score']].apply(lambda col: col.fillna(col.mean()))
df3

Unnamed: 0,Name,Age,Score
0,Alice,25.0,85.0
1,Bob,33.333333,90.0
2,Charlie,35.0,88.75
3,David,40.0,88.0
4,Eva,33.333333,92.0


In [76]:
# Grouping by 'City' and applying multiple aggregations
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Helen'],
    'Age': [25, 30, 35, 40, 22, 28, 33, 38],
    'City': ['New York', 'Los Angeles', 'New York', 'Chicago', 'Chicago', 'Los Angeles', 'New York', 'Chicago'],
    'Marital Status': ['Single', 'Married', 'Single', 'Married', 'Single', 'Single', 'Married', 'Single']
}

df4 = pd.DataFrame(data)


In [77]:
aggregated = df4.groupby('City').agg({
    'Age': ['mean', 'max'],
    'Name': 'count'
}).reset_index()
aggregated

Unnamed: 0_level_0,City,Age,Age,Name
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,max,count
0,Chicago,33.333333,40,3
1,Los Angeles,29.0,30,2
2,New York,31.0,35,3


In [78]:
grouped = df4.groupby('Marital Status').agg({'Age':'mean'})
grouped

Unnamed: 0_level_0,Age
Marital Status,Unnamed: 1_level_1
Married,34.333333
Single,29.6


TASK6
- Fill missing values in the "Grade" column with "Pending".
- Filter and display students who scored more than 75.
- Group the data by "Subject" and calculate the average score.

In [87]:
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eva", "Frank"],
    "Subject": ["Math", "Science", "English", "Math", "Science", "English"],
    "Score": [88, 67, 92, 74, 59, 81],
    "Grade": ["A", np.nan, "A+", "B", np.nan, "A"]
}

# Fill missing values in the "Grade" column with "Pending".
df['Grade'].fillna("Pending", inplace=True)

#filtering 
Highest_scores=df[df['Score'] > 75]

#grouping and finding average
avg_score=df.groupby("Subject")["Score"].mean()
df

print(Highest_scores,avg_score,sep='\n')


      Name  Subject  Score Grade
0    Alice     Math     88     A
2  Charlie  English     92    A+
5    Frank  English     81     A
Subject
English    86.5
Math       81.0
Science    63.0
Name: Score, dtype: float64


In [88]:
# Filter and display students who scored more than 75.
df[df['Score'] > 75]

Unnamed: 0,Name,Subject,Score,Grade
0,Alice,Math,88,A
2,Charlie,English,92,A+
5,Frank,English,81,A


In [89]:
# Group the data by "Subject" and calculate the average score.
df.groupby('Subject').agg({'Score':'mean'})

Unnamed: 0_level_0,Score
Subject,Unnamed: 1_level_1
English,86.5
Math,81.0
Science,63.0
