<a href="https://colab.research.google.com/github/lovnishverma/Python-Getting-Started/blob/main/Day_2_modules%20and%20file%20handling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **File handling in python and Modules in python**

**File Handling in python**

In [118]:
# create a new file sample.txt

file = open("sample.txt","w")

file.write("hello world")

file.close()

In [119]:
# reading from files

file = open("sample.txt","r")

content = file.read()
print(content)

file.close()

hello world


In [120]:
# overwrite a file
with open("sample.txt","w") as file:
    file.write("hello python")

In [121]:
# reading from files

file = open("sample.txt","r")

content = file.read()
print(content)

file.close()

hello python


# **PYTHON MODULES**

A Python module is a file containing Python code (definitions and statements) that can be imported and used in other Python programs. They are fundamental to Python's modular programming paradigm, which promotes simplicity, maintainability, and code reuse by organizing code into separate, logical files.

Python modules are primarily categorized by their origin and availability into **built-in**, **user-defined**, and **third-party modules**.

**Using built-in modules**

*eg:-* **math**

In [122]:
import math
print(math.sqrt(25))
print(math.pi)

5.0
3.141592653589793


*e.g:-* **random**

In [123]:
import random
print(random.randint(1,10))

7


# Import Variants

1. **Normal import**

In [124]:
import math

2. **Import specific functions**

In [125]:
from math import sqrt, pi
print(f'sqrt of 25 is: {sqrt(25)}, and value of pi is: {pi}')

sqrt of 25 is: 5.0, and value of pi is: 3.141592653589793


3. **Alias**

In [126]:
import pandas as pd  # pd is alias name of pandas module

# **User-Defined Modules**

In [127]:
%%writefile mymodule.py
def greet(name):
  return f"hello, {name}"

Overwriting mymodule.py


In [128]:
import mymodule
print(mymodule.greet('ROPAR'))

hello, ROPAR


# Third-Party Modules **(pip)** âš“

Install:

In [129]:
!pip install requests -q

Use:

In [130]:
import requests

In [131]:
res = requests.get("https://lovnishverma.in")

print(res.status_code) # 200 means OK

200


# **Why Modules matter**

**Modules:**

* reduce duplication
* increase code organization
* enable reuse
* support collaboration

https://pypi.org/project/pip/

68747470733a2f2f70616e6461732e7079646174612e6f72672f7374617469632f696d672f70616e6461732e737667.svg

Pandas is open source and you can check source code in github: https://pypi.org/project/pandas/   https://github.com/pandas-dev/pandas

**What is it?**

**pandas** is a Python package (module/library) that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way towards this goal.

**It provides two main data structures**

* Series - a 1-D labeled array

* Dataframe - A 2-D labeled array

`pip install pandas`

! sign is use din jupyter notebook or google colab

In [132]:
!pip install pandas -q  # Google colab already have these basic modules (libraries) installled

import pandas

In [133]:
import pandas as pd

**Pandas Series** (A series is like a column in Excel, consisting of data and an index)

In [134]:
data = [10,20,30,40]

series = pd.Series(data)

print(series)

0    10
1    20
2    30
3    40
dtype: int64


In [135]:
# using custom index
series = pd.Series([10,20,30,40], index=['a','b','c','d'])

print(series)

a    10
b    20
c    30
d    40
dtype: int64


**Accessing series element**


In [136]:
print(series['d'])

40


# **Pandas Dataframe**  (A dataframe is like an Excel spreadsheet with rows and columns)

In [137]:
data = {
    'name': ['Aman', 'Babita', 'Chaman'],
    'age': [25, 30, 35],
    'city': ['Noida', 'Sundernagar', 'Lahore']
}

df = pd.DataFrame(data)

print(df)

# Save to csv file
df.to_csv('datademo.csv', index=False)

     name  age         city
0    Aman   25        Noida
1  Babita   30  Sundernagar
2  Chaman   35       Lahore




---



In [138]:
df = pd.read_csv("https://raw.githubusercontent.com/lovnishverma/datasets/refs/heads/main/shoesizedemo.csv")
df.head(10)

Unnamed: 0,name,age,hieght,weight,gender,shoesize
0,Aman,25,5.5,70,Male,8.0
1,Anu,25,4.2,55,Female,4.0
2,Sarita,20,4.1,49,Female,4.5
3,Babita,30,4.0,50,Female,5.0
4,Chaman,35,5.1,60,Male,7.0
5,Lovnish,26,5.4,75,Male,8.0
6,Ravi Kant,23,5.2,70,Male,9.0


In [139]:
df.tail()

Unnamed: 0,name,age,hieght,weight,gender,shoesize
2,Sarita,20,4.1,49,Female,4.5
3,Babita,30,4.0,50,Female,5.0
4,Chaman,35,5.1,60,Male,7.0
5,Lovnish,26,5.4,75,Male,8.0
6,Ravi Kant,23,5.2,70,Male,9.0


In [140]:
df.describe() # only works with numerical columns not categorical

Unnamed: 0,age,hieght,weight,shoesize
count,7.0,7.0,7.0,7.0
mean,26.285714,4.785714,61.285714,6.5
std,4.88925,0.656832,10.483547,1.979057
min,20.0,4.0,49.0,4.0
25%,24.0,4.15,52.5,4.75
50%,25.0,5.1,60.0,7.0
75%,28.0,5.3,70.0,8.0
max,35.0,5.5,75.0,9.0


In [141]:
df.columns

Index(['name', 'age', 'hieght', 'weight', 'gender', 'shoesize'], dtype='object')