### Topics Covered

#### Import Modules and Packages
- Understanding imports
- Custom modules

#### Standard Library Overview
- Commonly used libraries

#### File Operations
- Read and Write Files
- Buffered Read/Write
- Other File Methods
- Working with File Paths

#### Logging & Debugging
- Logging module
- Debugging techniques

#### Concurrency
- Multithreading
- Multiprocessing

#### Regular Expressions
- Pattern matching
- Character identifiers

### Import Modules and Packages

##### Different ways of installing the Package

**1. Import 'Package Name' e.g (Import math)**

In [9]:
import math
math.sqrt(16) # Using one of the functions 'Sqrt' from package 'math' 
# if you are importing the entire package then you need to suffix the package name while calling the function

4.0

**2. from 'Package Name' import 'Function Name'**

In [13]:
from math import sqrt
sqrt(16)

# if you are importing the function within the package, then you dont need to add the package name suffix

4.0

If you select math. and click on tab, you could see all the functions within Math package

**3. Using alias name while installing packages**

In [23]:
import numpy as np

# we are assigning np as alias for numpy, since we dont want to call the package with full name ,hence the shortcut

In [29]:
# if numpy is not installed in system you can install it manually by pip install numpy
!pip install numpy

# Inside Jupyter noteook we should start with "!", where as "pip install numpy" works fine at Command Line / Terminal outside Python




[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [31]:
# using one of numpy package functions
np.array([1,2,3,4,5])

array([1, 2, 3, 4, 5])

**4. Installing all functions from the package using '*'**

In [39]:
from math import*
sqrt(16)
# '*" helps you access all the functions within the package 'math'

4.0

In [55]:
# Step 3 : Importing Custom Library in Jupyter Notebook
import sys
sys.path.append('D:/Project_2026/data-science-learning-journey/Data/Python/')

In [59]:
# Option 1 : installing package & accessing function
from my_custom_library.pp_lib import greet

greet('Tom Hanks')

Hello Tom Hanks, we Welcome you to our custom built Libraries


In [65]:
# Option 2 : installing package & accessing function
import my_custom_library

my_custom_library.pp_lib.greet('Jim Carrey')

Hello Jim Carrey, we Welcome you to our custom built Libraries


In [67]:
# Option 3 : installing library & accessing function
from my_custom_library import pp_lib

pp_lib.greet('Tom Cruise')


Hello Tom Cruise, we Welcome you to our custom built Libraries


In [69]:
# Now executing all functions from the same package

from my_custom_library.pp_lib  import *

print(greet('Elon Musk'))
print(addition(12,24))
print(substraction(42,24))
print(multiply(10,20))

Hello Elon Musk, we Welcome you to our custom built Libraries
None
36
18
200


In [71]:
# We created a new sub folder(subpackages) under our package (my_custom_library) and created a new function (pp_sub_lib)
# we will now call the subpackage and execute the new function

from my_custom_library.subpackages.pp_sub_lib import greet

greet('Elon Musk')

Hello Elon Musk, we Welcome you to our custom built Libraries, You are getting this message from our Subpackage function


##### Conclusion
Importing modules and packages in Python allows you to organize your code, reuse functionalities, and keep your projects clean and manageable. By understanding how to import modules, specific functions, and use relative imports within packages, you can structure your Python applications more effectively.

#### Standard Library Overview
Python's Standard Library is a vast collection of modules and packages that come bundled with Python, providing a wide range of functionalities out of the box. Here's an overview of some of the most commonly used modules and packages in the Python Standard Library.

**1. array**
- Provides space-efficient arrays of basic C types (like integers, floats).
- Unlike lists, arrays store elements of the same type.
- Useful when performance and memory efficiency matter

In [20]:
import array # - This module provides an array object that is more memory-efficient than lists, but requires all elements to be of the same type.

arr=array.array('i',[1,2,3,4])#type code"i" means signed integer(usually 4 bytes,basis platform).The second argument [1, 2, 3, 4]is the initializer list.
# Other type codes exist, e.g. 'f' for float, 'd' for double, 'u' for Unicode character, etc.
# So arr becomes an array of integers: [1, 2, 3, 4].

print(arr)  # So in short: you’re creating a typed array of integers and printing its representation.

array('i', [1, 2, 3, 4])


In [28]:
# Key Difference vs. List
#  List: Flexible, can hold mixed types, but less memory-efficient.
#  Array: Fixed type, more compact in memory, closer to C-style arrays.

lst = [1, 2, 3, 4]   # regular Python list
arr = array.array('i', [1, 2, 3, 4])  # typed array
print(lst)
print(arr)

[1, 2, 3, 4]
array('i', [1, 2, 3, 4])


**2. math**
- Offers mathematical functions: sqrt, log, sin, cos, factorials, etc.
- Includes constants like pi and e.
- Operates on numbers (not arrays).


In [32]:
import math
print(math.sqrt(16))
print(math.pi)

4.0
3.141592653589793


**3. random**
- Generates pseudo-random numbers.
- Functions: random(), randint(), choice(), shuffle().
- Useful for simulations, games, sampling, and randomized algorithms.

In [37]:
## random - one of the most commonly used library

import random
print(random.randint(1,10)) # shares a random value between the range specified
print(random.choice(['apple','banana','cherry'])) # selects a random item from the list shared 

8
banana


**4. os**
- Interfaces with the operating system.
- File and directory operations (os.listdir, os.remove).
- Environment variables, process management, path handling.

In [50]:
### File And Directory Access

import os
print(os.getcwd())
#getcwd - get current working directory , same as pwd

D:\Project_2026\data-science-learning-journey\Topics\Python


In [44]:
pwd

'D:\\Project_2026\\data-science-learning-journey\\Topics\\Python'

In [52]:
os.mkdir('test2_dir')
# mkdir - Make directory under working directory

**5. shutil**
- High-level file operations.
- Copying, moving, deleting files and directories.
- Archive handling (make_archive, unpack_archive).

In [59]:
## High level operations on files and collection of files
import shutil
shutil.copyfile('source.txt','destination.txt')
# Doesnt create a new one, just creates a copy of one file(source.txt) renames & save it as destination.txt under same directory

'destination.txt'

**6. json**
- Encode/decode JSON data.
- json.dumps → Python object → JSON string.
- json.loads → JSON string → Python object.
- Essential for APIs and config files.

In [68]:
## Data Serialization
# Working with Python's json module to convert a Python dictionary into a JSON string and then back into a dictionary!

import json
data={'name':'Praveen','age':45} # This is a normal Python dictionary containing key-value pairs

json_str=json.dumps(data) # - json.dumps(data) serializes (converts) the Python dictionary into a JSON-formatted string
print(json_str)
print(type(json_str))

parsed_data=json.loads(json_str) # - json.loads(json_str) deserializes (converts) the JSON string back into a Python dictionary.
print(parsed_data)
print(type(parsed_data))

"""
- JSON (JavaScript Object Notation) is a lightweight data format used for exchanging data between systems.
- The json.dumps() method is useful for storing and sending data in JSON format (e.g., APIs, databases).
- The json.loads() method helps retrieve JSON data and work with it in Python.

"""

{"name": "Praveen", "age": 45}
<class 'str'>
{'name': 'Praveen', 'age': 45}
<class 'dict'>


'\n- JSON (JavaScript Object Notation) is a lightweight data format used for exchanging data between systems.\n- The json.dumps() method is useful for storing and sending data in JSON format (e.g., APIs, databases).\n- The json.loads() method helps retrieve JSON data and work with it in Python.\n\n'

**7. csv**
- Read/write CSV files.
- csv.reader and csv.writer for structured tabular data.
- Common in data science and ETL workflows.

In [89]:
import csv # - This module provides functionality to read from and write to CSV (Comma-Separated Values) files.

with open('example.csv',mode='w',newline='') as file: # Opens (or creates if not present) a file named  in write mode (). newline='' ensures rows are 
# written correctly without extra blank lines (important on Windows).'with' is a context manager: it automatically closes the file when the block ends   
    writer=csv.writer(file)#Creates a CSV writer object linked to the file.This object knows how to format Python lists into properly structured CSV rows
    writer.writerow(['name','age']) # Writes a single row into the CSV file. Each element in the list becomes a column in the CSV.
    writer.writerow(['Prisha',11]) # Now the file contains two rows: a header and one data row

with open('example.csv',mode='r') as file:# Reopens the same file,this time in read mode('r').with ensures the file is closed automatically after reading
    reader=csv.reader(file) # Creates a CSV reader object linked to the file. This object reads rows from the CSV and returns them as lists of strings.
    for row in reader: # - Iterates through each row in the CSV file.row is a list of values from that row.
        print(row)

['name', 'age']
['Prisha', '11']


- Notice that '11' is read back as a string, not an integer. CSV files don’t store data types, only text.
- If you want to convert '11' back to an integer, you’d need to cast it manually:
int(row[1])

So in summary:
- First block writes a CSV file with a header and one row.
- Second block reads the file back and prints each row as a list of strings

**8. datetime**
- Work with dates and times.
- Classes: date, time, datetime, timedelta.
- Formatting and parsing with strftime and strptime.

In [103]:
from datetime import datetime,timedelta

now=datetime.now() # - Returns the current local date and time as a datetime object.
print(now)

yesterday=now-timedelta(days=1) # - timedelta(days=1) creates a time difference of 1 day
print(yesterday)

2026-01-16 17:15:10.031308
2026-01-15 17:15:10.031308


**9. time**
- Time-related functions.
- time.time() → current epoch time.
- sleep() for delays.
- Lower-level than datetime.

In [111]:
import time
print(time.time())
time.sleep(2)

print(time.time())
# show current time, pause/sleep for 2 secs and show time again
# - The integer part is seconds, the decimal part is fractions of a second.

1768564140.6929834
1768564142.69378


In [117]:
# Lets convert these epoch timestamps into human-readable format
print("Raw epoch time",time.time())
print("Time in Readable format :", time.ctime(time.time()))  # human-readable
time.sleep(2)
print("Break of 2 Secs")
print("Time after pause :", time.ctime(time.time()))  # human-readable

Raw epoch time 1768564326.732182
Time in Readable format : Fri Jan 16 17:22:06 2026
Break of 2 Secs
Time after pause : Fri Jan 16 17:22:08 2026


**10. re**
- Regular expressions.
- Pattern matching, searching, replacing text.
- Functions: match, search, findall, sub.

In [128]:
## Regular expression - using the re (regular expressions) module in Python to search for digits (\d+) in a given text string
import re # This imports Python's built-in regular expressions library, which allows you to work with pattern matching.

pattern=r'\d+' # \d+ is a regular expression pattern that matches one or more digits (0-9).The r'' before the string makes it a raw string, ensuring 
# Python doesn’t process escape sequences like \.
text='There are 123 apples 456' 
match=re.search(pattern,text)
print("For first numeric match only :",match.group()) 
print("For all numeric match across the text :", re.findall(pattern, text)) 


# The re.search() function searches for the first occurrence of one or more digits (\d+) in text. Scan through string looking for a match to the pattern,
# returning a Match object, or None if no match was found.
# The text contains numbers 123 and 456, but search() returns only the first match.
# .group() retrieves the actual text that matched the pattern.
# Since re.search() finds the first occurrence, it returns "123".
                                                                                                                           
# Key Takeaways
# - \d+ matches numbers.
# - re.search() finds the first match only.
# - To find all occurrences, use re.findall():
# print(re.findall(pattern, text))  # Output: ['123', '456']
# - To extract numbers one by one, use re.finditer() for better control.

For first numeric match only : 123
For all numeric match across the text : ['123', '456']


In [73]:
# Regular expressions(regex) allow us to search for general patterns in text data (text mining)
# Python comes with built in library for re. The re Lib allows us to create specialised pattern strings & search for matches within text
# e.g of phone number (555)-555-5555  , Regex Pattern r"(\d\d\d)-(d\d\d)-(d\d\d\d)"
# Typical text mining will search for exact word and return True or False

text = " The agent's phone number is  123-234-4567. Call Soon!! "

In [75]:
# Text search will return true on exact match
'phone'in text

True

In [77]:
# Uploading "re function"
import re

In [79]:
# create search text
pattern = 'phone'

In [81]:
# Enter Search Var and source data
re.search(pattern,text)
# It will confirm the match and higlight index location , e.g it starts at Index Location 13 and end by 18

<re.Match object; span=(13, 18), match='phone'>

In [83]:
# now enter some other text which is not available in source text
pattern = 'wrong keyword'

In [85]:
# Since there is no match, it will simply excute without returning any result
re.search(pattern,text)

In [87]:
# Adding another pattern which is available in the text
pattern2 = '123-234-4567'

In [89]:
re.search(pattern2,text)

<re.Match object; span=(30, 42), match='123-234-4567'>

In [91]:
# Assigning it to variable 
mymatch=re.search(pattern2,text)

In [93]:
mymatch

<re.Match object; span=(30, 42), match='123-234-4567'>

In [95]:
# You can also report the location using span
mymatch.span()

(30, 42)

In [97]:
# ask for the starting index value
mymatch.start()

30

In [99]:
# ask for the end index value
mymatch.end()

42

In [101]:
# re.search will find and return only the first match. What about cases with multiple or group matches. we have few more fns
nexttext= "Hi Prisha, where is Gemma? Why are you quite Prisha? Answer me Prisha !!"

In [107]:
# Testing with re.search to look for name
Singlematch=re.search('Prisha',nexttext)
Singlematch.span()

(3, 9)

In [109]:
# Find multiple match 
groupmatch = re.findall('Prisha',nexttext)
groupmatch

['Prisha', 'Prisha', 'Prisha']

In [111]:
# to find the occurence
len(groupmatch)

3

In [113]:
# We can include it in for loop and get all the positions, using finditer to iterate thru all text and report text match occurences

for match in re.finditer ('Prisha',nexttext):
    print(match)

<re.Match object; span=(3, 9), match='Prisha'>
<re.Match object; span=(45, 51), match='Prisha'>
<re.Match object; span=(63, 69), match='Prisha'>


In [115]:
# for displaying only the location
for match in re.finditer ('Prisha',nexttext):
    print(match.span())

(3, 9)
(45, 51)
(63, 69)


In [117]:
# for displaying just the text 
for match in re.finditer ('Prisha',nexttext):
    print(match.group())

Prisha
Prisha
Prisha


#### Other Key Standard Libraries available in Python

Here are some **commonly used rest of the list**:

| Library       | Purpose                                                                 |
|---------------|-------------------------------------------------------------------------|
| **sys**       | Access to interpreter variables, command-line args, exit codes.         |
| **pathlib**   | Object-oriented filesystem paths (modern alternative to `os.path`).     |
| **collections** | Specialized data structures (`Counter`, `defaultdict`, `deque`, `namedtuple`). |
| **itertools** | Iterators for combinatorics, infinite sequences, efficient looping.     |
| **functools** | Higher-order functions (`lru_cache`, `partial`, `reduce`).              |
| **statistics** | Basic statistical functions (mean, median, stdev).                     |
| **logging**   | Flexible logging system for applications.                               |
| **subprocess** | Run and manage external processes.                                     |
| **threading / multiprocessing** | Concurrency and parallelism.                          |
| **typing**    | Type hints for modern Python codebases.                                 |
| **http / urllib** | Networking, HTTP requests, URL handling.                            |
| **decimal / fractions** | Precise arithmetic beyond floats.                             |
| **sqlite3**   | Built-in lightweight database engine.                                   |

##### Conclusion
Python's Standard Library is extensive and provides tools for almost any task you can think of, from file handling to web services, from data serialization to concurrent execution. Familiarizing yourself with the modules and packages available in the Standard Library can significantly enhance your ability to write efficient and effective Python programs.

### Accessing Files & Folders

In [149]:
pwd

'D:\\Project_2026\\data-science-learning-journey\\Topics\\Python'

In [151]:
# Importing the OS module, which helps with getting the current working dir,list all the files in dir etc
import os

# OS specific command, just like pwd(Jupiter specific command)
os.getcwd()

'D:\\Project_2026\\data-science-learning-journey\\Topics\\Python'

In [153]:
# Creating a temp file(non existant)

p=open('Python_tempfile.txt','w+')
p.write('This is a Test String')
p.close()

In [157]:
# to List eveything under dir
os.listdir()

['.ipynb_checkpoints',
 '1. Python - Syntax & Semantics, Core Building Bocks , Operators.ipynb',
 '2. Python - Control Flow, Data Structures & Functions.ipynb',
 '3. Python - Object-Oriented Programming & Design.ipynb',
 '4. Python Advanced Modules & Libraries.ipynb',
 'backup',
 'Python Excercises.ipynb']

In [161]:
# To List directory for specific folder
os.listdir('D:\\Project_2026\\data-science-learning-journey')

['.git',
 '.gitignore',
 'Cheatsheets',
 'Data',
 'Images',
 'LICENSE',
 'Projects',
 'README.md',
 'Resources',
 'Topics']

In [163]:
# Import module to help move files between folders, diff locations
# Syntax : shutil.move(source,destination)

import shutil

In [192]:
# Before we move a file,lets first create a temp file 
f=open("Python_tempfile.txt",'w')

In [194]:
# if we try to move the file we will get the error"The process cannot access the file because it is being used by another process: 'Python_tempfile.txt'
# Lets first write something and then close the file
# Function to write into a file
f.write("This is my first write operation, in a empty file that i just created")

# It will give us a number on how many bytes was created

69

In [196]:
# But if you check the fie, it will be blank, becoz we havent closed the file yet
f.close()
# Now the above contents are visible in the temp file

In [198]:
# File successfully moved to new location
shutil.move('Python_tempfile.txt','D:\\Project_2026\\data-science-learning-journey')

'D:\\Project_2026\\data-science-learning-journey\\Python_tempfile.txt'

In [202]:
# Moving the temp file back from data-science-learning-journey to my current directory
shutil.move('D:\\Project_2026\\data-science-learning-journey\\Python_tempfile.txt',os.getcwd())

'D:\\Project_2026\\data-science-learning-journey\\Topics\\Python\\Python_tempfile.txt'

In [208]:
# There you see the file back in its place
os.listdir()

['.ipynb_checkpoints',
 '1. Python - Syntax & Semantics, Core Building Bocks , Operators.ipynb',
 '2. Python - Control Flow, Data Structures & Functions.ipynb',
 '3. Python - Object-Oriented Programming & Design.ipynb',
 '4. Python Advanced Modules & Libraries.ipynb',
 'backup',
 'Python Excercises.ipynb',
 'Python_tempfile.txt']

#### File Operation- Read And Write Files

File handling is a crucial part of any programming language. Python provides built-in functions and methods to read from and write to files, both text and binary. This lesson will cover the basics of file handling, including reading and writing text files and binary files.

Using with open(...) is best practice because:
- It handles closing automatically.
- It prevents resource leaks.
- It makes code cleaner and safer.

 Why we use **with**
- Automatic cleanup: It ensures resources are released properly (e.g., files are closed, locks are released).
- Error safety: Even if an exception occurs inside the block, cleanup still happens.
- Cleaner code: No need to manually call close() or release()


In short
Use **with**  whenever you’re working with something that needs setup and teardown. It’s Python’s way of saying: “I’ll handle the housekeeping for you.”

In [220]:
### Read a Whole File

with open('Python_tempfile.txt','r') as file:
    content=file.read()
    print(content)

Hello Reader,
How are you doing
My Name is Praveen 
Welcome to my course


In [222]:
## Read a file line by line
with open('Python_tempfile.txt','r') as file:
    for line in file:
        print(line)

Hello Reader,

How are you doing

My Name is Praveen 

Welcome to my course


In [226]:
## using strip() to remove the newline character
with open('Python_tempfile.txt','r') as file:
    for line in file:
         print(line.strip()) 

Hello Reader,
How are you doing
My Name is Praveen
Welcome to my course


In [231]:
## Writing a file(Overwriting)

with open('Python_tempfile.txt','w') as file:
    file.write('Sorry we are using W to overwrite this file\n')
    file.write('Previous content are lost.')

In [1]:
with open('Python_tempfile.txt','r') as file:
    for line in file:
         print(line.strip()) 

Sorry we are using W to overwrite this file
Previous content are lost.


In [13]:
## Write a file(wwithout Overwriting)
with open('Python_tempfile.txt','a') as file:
    file.write("\n We have now used Append operation to append the file without overwriting!\n")

In [15]:
with open('Python_tempfile.txt','r') as file:
    for line in file:
         print(line.strip()) 

Sorry we are using W to overwrite this file
Previous content are lost.

We have now used Append operation to append the file without overwriting!


In [17]:
### Writing a list of lines to a file
lines=['First line \n','Second line \n','Third line\n']
with open('Python_tempfile.txt','a') as file:
    file.writelines(lines)

In [19]:
with open('Python_tempfile.txt','r') as file:
    for line in file:
         print(line.strip()) 

Sorry we are using W to overwrite this file
Previous content are lost.

We have now used Append operation to append the file without overwriting!
First line
Second line
Third line


In [21]:
### Binary Files

# Writing to a binary file
data = b'\x00\x01\x02\x03\x04'
with open('Binary_tempfile.bin', 'wb') as file:
    file.write(data)

In [25]:
# Reading a binary file
with open('Binary_tempfile.bin', 'rb') as file:
    content = file.read()
    print(content)

b'\x00\x01\x02\x03\x04'


In [29]:
### Read the content from a source text file and write to a destination text file
# Copying a text file
with open('Python_tempfile.txt', 'r') as source_file:
    content = source_file.read()

# Writing in Destination file
with open('Python_S2D_tempfile.txt','w') as destination_file:
    destination_file.write(content)

In [31]:
# Lets review the content from destination file
with open('Python_S2D_tempfile.txt','r')as file :
    for line in file:
        print(line.strip())

Sorry we are using W to overwrite this file
Previous content are lost.

We have now used Append operation to append the file without overwriting!
First line
Second line
Third line


In [215]:
# Read a text file and count the number of lines, words, and characters.
# Counting lines, words, and characters in a text file
def count_text_file(file_path):
    with open(file_path, 'r') as file:
        lines = file.readlines() # Return a list of lines from the stream.
        line_count = len(lines) # counts how many lines
        word_count = sum(len(line.split()) for line in lines) # splits each line into words and sums them using generator comprehension
        char_count = sum(len(line) for line in lines) # counts all characters including spaces and newline \n
    return line_count, word_count, char_count

file_path = 'Python_tempfile.txt'
lines, words, characters = count_text_file(file_path)
print(f'Lines: {lines}, Words: {words}, Characters: {characters}')

Lines: 7, Words: 31, Characters: 183


In [71]:
### Writing and then reading a file

with open('Read_write_file.txt','w+') as file: # 'w' is open for writing & '+' is open a disk file for updating (reading and writing)
    file.write("Hello world\n")
    file.write("This is a new line \n")

    ## Move the file cursor to the beginning
    file.seek(0)

    ## Read the content of the file
    content=file.read()
    print(content)

Hello world
This is a new line 



In [121]:
# This command will get us the file size (190MB)
import os
os.path.getsize("Python_tempfile.txt")

190

In [123]:
# Copying one file to another
import shutil
shutil.copy("Python_tempfile.txt","copy_of_Python_tempfile.txt")

'copy_of_Python_tempfile.txt'

In [127]:
with open('copy_of_Python_tempfile.txt', 'r') as file:
    p=file.read()
    print(p)

Sorry we are using W to overwrite this file
Previous content are lost.

 We have now used Append operation to append the file without overwriting!
First line 
Second line 
Third line



In [137]:
# another method to check the contents
p2=open('copy_of_Python_tempfile.txt', 'r')
p2.read()

'Sorry we are using W to overwrite this file\nPrevious content are lost.\n\n We have now used Append operation to append the file without overwriting!\nFirst line \nSecond line \nThird line\n'

In [161]:
# Lets try reading a Dictionary
data = {
    "name" : "sudh",
    "mail_id" : "sudh@gmail.com",
    "phone_number" : 91345435,
    "subject" :["data science" , "big data" , "data analytics"]
}

In [163]:
# Using Write operation (dump)
import json # Json : Java Script Object notation

with open("Json_test_file.json","w") as f : # Giving an alias name as f
    json.dump(data, f) # Using Function dump to dump dictionary data to alias f file

In [165]:
# Using a read operation
with open ("Json_test_file.json","r") as d:
    transfer1=json.load(d) #Assigning read data (using load fn) to new variable

In [167]:
transfer1

{'name': 'sudh',
 'mail_id': 'sudh@gmail.com',
 'phone_number': 91345435,
 'subject': ['data science', 'big data', 'data analytics']}

In [169]:
type(transfer1)

dict

In [173]:
# Lets try accessing keyword "big data" from dictionary above
transfer1['subject'][1]

'big data'

In [175]:
# How to store a nested dictionary file into a .csv file type
data = [["name" , "email_id" , "number"],
       ["sudh" , "sudh@gmail.com" , 92342342],
        ["krish" , "krish@gmail.com" , 9324324242]
       ]

In [177]:
import csv
with open("Dictionary_to_csv.csv","w") as a :
    p=csv.writer(a)
    for i in data:
        p.writerow(i)
        
# You will be able to read/write in a much simpler ways in Pandas

In [179]:
# Lets try to read the .csv file

with open("Dictionary_to_csv.csv","r") as b:
    read=csv.reader(b)
    for i in read:
        print(i)

['name', 'email_id', 'number']
[]
['sudh', 'sudh@gmail.com', '92342342']
[]
['krish', 'krish@gmail.com', '9324324242']
[]


In [185]:
# Lets now try to write/read a binary file. Binary data which might be available in the form audio, video or images


with open("Binary_file.bin","wb")as l: # Binary files end with ".bin" ext & we use "wb" to write binary
     l.write(b"\x01\x02\x03")   # Binary data should be started with "b" followed by Binary data
        
        
# If you check the Binary file,all that you will see is 3 Dots "..." or 3 boxes with "?" in it ,which is a conversion of above binary values to images   

In [187]:
# Lets read the binary file
with open("Binary_file.bin","rb")as l:
    print(l.read())

b'\x01\x02\x03'


#### Buffered Read And Write Other File Methods

Buffered read/write means data is handled in chunks through a temporary memory area (a buffer) instead of one byte or character at a time. This makes input/output operations much faster and more efficient.l exactly when data hits the disk?



**What Buffering Is**
- Without buffering: Every read or write request goes directly to disk or network. These operations are slow because they involve hardware.
- With buffering: Data is first stored in memory (RAM) in a buffer. Reads and writes happen in larger blocks, reducing the number of system calls.
Think of it like reading a book:
- Without buffering → you read one word at a time.
- With buffering → you grab a whole paragraph into memory, then read from it.

Advantages
- Performance boost: Disk I/O is slow; buffering minimizes calls.
- Memory efficiency: Buffer sizes can be tuned (e.g., 1KB, 4KB).
- Automatic cleanup: When using 'with', buffers are flushed and files closed safely.

In [205]:
# with the help of Buffer write, you will be able to write in chunks
import io

with open("BufferWrite_Textfile.txt", "wb") as f : # wb → write in binary mode (so you must use b"..." for bytes)
    file = io.BufferedWriter(f) # This adds a buffer layer, so writes happen in chunks (faster, fewer system calls)
# You opened the file with 'wb'(write binary mode),In binary mode,Python expects you to write bytes,not text strings.That’s why you used b"..." it matches the expected type.
    file.write(b"Data Science Masters course is highly curated and uniquely designed according to the latest industry standards. This program instills students the skills essential to knowledge discovery efforts to identify standard, novel, and truly differentiated solutions and decision-making, including skills in managing, querying, analyzing, visualizing, and extracting meaning from extremely large data sets. This trending program provides students with the statistical, mathematical and computational skills needed to meet the large-scale data science challenges of today's professional world. You will learn all the stack required to work in data science industry including cloud infrastructure and real-time industry projects. This course will be taught in Hindi language.\n")
    file.write(b"this is my second line that i am trying to write") # You’re writing raw bytes. Each call goes into the buffer, not directly to disk
    file.flush() # Forces buffered data to be written to disk immediately.Without flush, data might stay in memory until the buffer fills or the file closes.


In [208]:
with open("BufferWrite_Textfile.txt" , "rb") as f :
    file = io.BufferedReader(f) # Adds buffering, so reads happen in chunks (efficient for large files).
    data = file.read()
    print(data)

b"Data Science Masters course is highly curated and uniquely designed according to the latest industry standards. This program instills students the skills essential to knowledge discovery efforts to identify standard, novel, and truly differentiated solutions and decision-making, including skills in managing, querying, analyzing, visualizing, and extracting meaning from extremely large data sets. This trending program provides students with the statistical, mathematical and computational skills needed to meet the large-scale data science challenges of today's professional world. You will learn all the stack required to work in data science industry including cloud infrastructure and real-time industry projects. This course will be taught in Hindi language.\nthis is my second line that i am trying to write"


Why .read() is needed
- A BufferedReader is like a wrapper around the file stream.
- The actual text/bytes are inside the file, not inside the object’s “label.”
- .read() (or .readline(), .readlines()) tells Python: “Go into the buffer, fetch the data, and give it back to me.”
- Once you call .read(), you get the contents (bytes or string), which you can then print.

When to Use Each
- Binary mode (wb,rb ) + b".."
→ When dealing with raw data (images, PDFs, network streams, or when you want exact byte control).
- Text mode (w,r) + "..."
→ When dealing with human‑readable text files.

In [212]:
with open("BufferWrite_Textfile.txt" , "rb") as f :
    file = io.BufferedReader(f)
    data = file.read(100) # Reads up to 100 bytes from the file
    print(data)

b'Data Science Masters course is highly curated and uniquely designed according to the latest industry'


#### Working With File Paths
When working with files in Python, handling file paths correctly is crucial to ensure your code works across different operating systems and environments. Python provides several modules and functions for working with file paths effectively.

In [5]:
#### OS module prints the current working directory
import os
cwd=os.getcwd()
print(f"Current working directory is {cwd}")

Current working directory is D:\Project_2026\data-science-learning-journey\Topics\Python


In [7]:
pwd

'D:\\Project_2026\\data-science-learning-journey\\Topics\\Python'

In [9]:
# Creates a new directory named 'new_directory'
new_directory="package" # Name of the directory to create
os.mkdir(new_directory) # Creates the directory , os.mkdir(path) creates a single directory at the given path.
print(f"Directory '{new_directory}' created")

Directory 'package' created


In [11]:
# If you want to avoid errors when the directory already exists, use os.makedirs with :exist_ok=True
import os

new_directory = "package"
os.makedirs(new_directory, exist_ok=True)
print(f"Directory '{new_directory}' created or already exists")

Directory 'package' created or already exists


In [17]:
## Listing Files And Directories from root('D:\\Project_2026\\data-science-learning-journey\\Topics\\Python)
items=os.listdir('.')
print(items)
print(type(items))

['.ipynb_checkpoints', '1. Python - Syntax & Semantics, Core Building Bocks , Operators.ipynb', '2. Python - Control Flow, Data Structures & Functions.ipynb', '3. Python - Object-Oriented Programming & Design.ipynb', '4. Python Advanced Modules & Libraries.ipynb', 'backup', 'Binary_file.bin', 'Binary_tempfile.bin', 'BufferWrite_Textfile.txt', 'copy_of_Python_tempfile.txt', 'Dictionary_to_csv.csv', 'example.txt', 'Json_test_file.json', 'package', 'Python Excercises.ipynb', 'Python_S2D_tempfile.txt', 'Python_tempfile.txt', 'Read_write_file.txt']
<class 'list'>


In [19]:
items = os.listdir('.')
for item in items:
    print(item)  # Prints each item on a new line

# When you pass '.' as an argument, Python lists all files and directories inside the folder where your script is running.

.ipynb_checkpoints
1. Python - Syntax & Semantics, Core Building Bocks , Operators.ipynb
2. Python - Control Flow, Data Structures & Functions.ipynb
3. Python - Object-Oriented Programming & Design.ipynb
4. Python Advanced Modules & Libraries.ipynb
backup
Binary_file.bin
Binary_tempfile.bin
BufferWrite_Textfile.txt
copy_of_Python_tempfile.txt
Dictionary_to_csv.csv
example.txt
Json_test_file.json
package
Python Excercises.ipynb
Python_S2D_tempfile.txt
Python_tempfile.txt
Read_write_file.txt


In [21]:
os.listdir('D:\My_Projects\study_hub')  # Lists items in the specified directory

# If you want to list items in a different directory, you can replace '.' with the desired path:

  os.listdir('D:\My_Projects\study_hub')  # Lists items in the specified directory


['.ipynb_checkpoints',
 'Jupyter_files',
 'lecture_names.xlsx',
 'Python',
 'Study_Plan',
 'WebScrapper.ipynb']

In [23]:
# Python treats \M as an escape sequence, similar to \n (newline) or \t (tab), but since \M isn't a valid escape sequence, it throws a warning.
os.listdir('D:\\My_Projects\\study_hub') # using \\ to avoid message

['.ipynb_checkpoints',
 'Jupyter_files',
 'lecture_names.xlsx',
 'Python',
 'Study_Plan']

In [27]:
# using list comprehension to seperate files from directories
files = [f for f in os.listdir('.') if os.path.isfile(f)] #  checks if the item is a file.
directories =[d for d in os.listdir('.') if os.path.isdir(d)] # checks if the item is a directory.

print("Files :",files)
print("Directories :",directories)

Files : ['1. Python - Syntax & Semantics, Core Building Bocks , Operators.ipynb', '2. Python - Control Flow, Data Structures & Functions.ipynb', '3. Python - Object-Oriented Programming & Design.ipynb', '4. Python Advanced Modules & Libraries.ipynb', 'Binary_file.bin', 'Binary_tempfile.bin', 'BufferWrite_Textfile.txt', 'copy_of_Python_tempfile.txt', 'Dictionary_to_csv.csv', 'example.txt', 'Json_test_file.json', 'Python Excercises.ipynb', 'Python_S2D_tempfile.txt', 'Python_tempfile.txt', 'Read_write_file.txt']
Directories : ['.ipynb_checkpoints', 'backup', 'package']


In [31]:
# Joining Paths
# constructing a file path by joining a directory name (folder) and a file name (file.txt) using os.path.join().
dir_name="Test_folder"
file_name="Test_file.txt"
full_path=os.path.join(dir_name,file_name)
print(full_path)

# Why Use os.path.join()?
# - It ensures compatibility across different operating systems. Windows uses \, while macOS/Linux use / in file paths.
# - It avoids manual string concatenation like "folder/" + "file.txt" which may lead to path errors.

Test_folder\Test_file.txt


In [35]:
# But if you want to get the complete path & not just folder & text name, then you can use the os.getcwd
dir_name="fake_folder"
file_name="fake_file.txt"
full_path=os.path.join(os.getcwd(),dir_name,file_name)
print(full_path)
# it doesnt actually create folders or files, it just helps joining in correct format

D:\Project_2026\data-science-learning-journey\Topics\Python\fake_folder\fake_file.txt


In [39]:
# Command to check if the said path exists
path='copy_of_Python_tempfile.txt'
if os.path.exists(path):
    print(f"The path '{path}' exists")
else:
    print(f"The path '{path}' does not exists")

The path 'copy_of_Python_tempfile.txt' exists


In [43]:
# Checking if a Path is a File or Directory
import os

path = 'copy_of_Python_tempfile.txt'
if os.path.isfile(path):
    print(f"The path '{path}' is a file.")
elif os.path.isdir(path):
    print(f"The path '{path}' is a directory.")
else:
    print(f"The path '{path}' is neither a file nor a directory.")


The path 'copy_of_Python_tempfile.txt' is a file.


In [45]:
import os

path = 'package'
if os.path.isfile(path):
    print(f"The path '{path}' is a file.")
elif os.path.isdir(path):
    print(f"The path '{path}' is a directory.")
else:
    print(f"The path '{path}' is neither a file nor a directory.")


The path 'package' is a directory.


In [50]:
## Getting the absolute path
relative_path='fake_file.xlsx'
absolute_path=os.path.abspath(relative_path)
print(absolute_path)

D:\Project_2026\data-science-learning-journey\Topics\Python\fake_file.xlsx


**os.path.abspath(relative_path)** doesn't actually check whether the file exists—it simply constructs and returns the absolute path based on the current working directory.

Why Does This Happen?
- Python takes your relative path (example.txt) and appends it to the current working directory.
- But it doesn't verify if fake_file.xlsx is really there—it just generates the full path as if it existed.
- To actually check if the file exists, you need to use below code

In [53]:
# However if i share the right path, Python will look in that location 
import os
relative_path = 'study_hub/lecture_names.xlsx'
absolute_path = os.path.abspath(relative_path)

if os.path.exists(absolute_path):  # Verifies if the file exists
    print(f"The file exists at: {absolute_path}")
else:
    print(f"No file found at: {absolute_path}")

No file found at: D:\Project_2026\data-science-learning-journey\Topics\Python\study_hub\lecture_names.xlsx


In [57]:
import os
relative_path = 'Binary_file.bin'
absolute_path = os.path.abspath(relative_path)

if os.path.exists(absolute_path):  # Verifies if the file exists
    print(f"The file exists at: {absolute_path}")
else:
    print(f"No file found at: {absolute_path}")

The file exists at: D:\Project_2026\data-science-learning-journey\Topics\Python\Binary_file.bin


### Logging Debugger

A **Logging Debugger** generally refers to the practice of using debug-level logging to trace and understand how code executes, helping developers identify issues by recording detailed runtime information.

What It Means
- Debug Logging: Writing detailed messages about an application’s internal state (variable values, execution paths, function calls) into logs.
- Purpose: Acts like breadcrumbs left by your program, so when something breaks, you can retrace the steps and see what happened.
- Logging Debugger: Not a separate tool, but a way of using logging as a debugging mechanism. Instead of attaching a traditional debugger, developers rely on logs to understand system behavior.

###  Comparison of Logging Levels

| **Level** | **Purpose**              | **Typical Use Case**                |
|-----------|--------------------------|-------------------------------------|
| ERROR     | Critical failures        | Application crashes, exceptions     |
| WARNING   | Potential issues         | Deprecated API usage                |
| INFO      | General status updates   | Service started, user logged in     |
| DEBUG     | Detailed internal state  | Variable values, execution paths    |

In [68]:
import logging

In [76]:
# Configuring Python’s  module to write logs into a file named  with a minimum level of INFO
logging.basicConfig(filename = "Logfile_test.log" , # Log file name
                    level = logging.INFO) # Minimum log level

- filename = 'Logfile_test.log'-> All log messages will be written to this file (created in the current working directory).
- level = logging.INFO → Only messages with severity INFO and above ( WARNING, ERROR, CRITICAL) will be recorded.
- DEBUG messages will be ignored because the threshold is set to INFO.

In [81]:
# Lets run different logging labels which will be recorded in the file we created, starting with INFO
logging.info("This is an info message")
# Below line will be updated in file "Logfile_test.log"
# INFO:root:This is an info message

In [83]:
logging.error("this is my error")

In [85]:
logging.critical("this is my critical")

In [87]:
logging.warning("this is my warning ")

In [93]:
logging.debug("this is my info related to debug") # Will NOT be logged because the threshold is set to INFO.

In [95]:
logging.noset("this is my noset related log")

AttributeError: module 'logging' has no attribute 'noset'

It will print in this particular Order, Since we had used "level = logging.INFO" , it will not print Noset/ Debug commands
But everything else will be printed at same or below level
1. NOSET
2. DEBUG
3. INFO
4. WARNING
5. ERROR
6. CRITICAL

In [108]:
# Lets set up Python’s logging system to capture all messages from DEBUG upwards into a file called test1.log, with each entry including a timestamp
import logging

logging.basicConfig(
    filename="test1.log",          # Log file name
    level=logging.DEBUG,           # Minimum log level (DEBUG and above) Captures everything: DEBUG,INFO,WARNING,ERROR,CRITICAL
    format="%(asctime)s %(message)s"  # Log format: timestamp + message
)

In [110]:
# Example Usage
logging.debug("Debugging details here")
logging.info("General info message")
logging.warning("This is a warning")
logging.error("An error occurred")
logging.critical("Critical issue!")

In [116]:
print(os.path.exists("test1.log"))

False


In [122]:
import logging

# Remove all existing handlers (important in Jupyter/interactive sessions)
for handler in logging.root.handlers[:]:
    logging.root.removeHandler(handler)

# Now reconfigure logging
logging.basicConfig(
    filename="test1.log",
    level=logging.DEBUG,
    format="%(asctime)s - %(levelname)s - %(message)s"
)

# Test messages
logging.debug("Debug message goes to test1.log")
logging.info("Info message goes to test1.log")
logging.warning("Warning message goes to test1.log")

In [128]:
logging.shutdown() # Perform any cleanup actions in the logging system (e.g. flushing buffers)

In [134]:
# Same issue, had to run the removeHandler code to release control and only then 'test3.log' file appeared in the directory
for handler in logging.root.handlers[:]:
    logging.root.removeHandler(handler)

logging.basicConfig(filename = "test3.log" ,
                    level = logging.DEBUG, 
                    format = '%(asctime)s %(name)s %(levelname)s  %(message)s')

In [136]:
logging.info("this is my info log")

In [142]:
l = [1,2,3,4,[4,5,6] , "sudh" ,"kumar"]

In [176]:
print(type(l[1]))
print(type(l[4]))
print(type(l[5]))

<class 'int'>
<class 'list'>
<class 'str'>


In [184]:
# Iterating through a mixed list, separating integers and strings, and logging details when you encounter a nested list
import logging

l = [1, 2, 3, 4, [4, 5, 6], "sudh", "kumar"]
l1_int = []   # to collect integers
l2_str = []   # to collect strings

for i in l:
    if type(i) == list:              # if element is a list
        for j in i:                  # iterate through sub-elements
            logging.info("logggin my j {j} and i is {i}".format(i=i, j=j)) # e.g "logggin my j 4 and i is [4, 5, 6]"
# This line is logging a message that shows both the inner loop value (j) & the outer loop value (i),helping you trace what’s happening during iteration.
            if type(j) == int:       # if sub-element is int
                l1_int.append(j)
    elif type(i) == int:             # if element is int
        l1_int.append(i)
    else:
        if type(i) == str:           # if element is string
            l2_str.append(i)

In [186]:
print(l1_int)
print(l2_str)

[1, 2, 3, 4, 4, 5, 6]
['sudh', 'kumar']


In [188]:
# Including logs

l1_int = []
l2_str = []
for i in l : 
    logging.info("this is the start of my first for loop {}".format(l)) # You can understand how your program is moving 
    logging.info("this is the value of i am logging {}".format(i))  # in a step by step way
    if type(i) == list:
        for j in i :
            logging.info("logggin my j {j} and i is {i}".format(i = i ,j = j))
            if type(j) == int :
                l1_int.append(j)
    elif type(i) == int :
        l1_int.append(i)
        
    else :
        if type(i) == str:
            l2_str.append(i)
logging.info("this is my final result  with all int {l1} ,with all str{l2}".format(l1 =l1_int ,l2 =l2_str ))


# Print statement is not a solution in production area. You may need to store something permanently, thats where logs 
# will come in handy

#### Multithreading

In Python, multithreading allows multiple threads to run concurrently within a single process. It’s most useful for **I/O-bound tasks** (like network requests, file operations) but less effective for **CPU-bound tasks** because of Python’s **Global Interpreter Lock (GIL)**.


What is Multithreading?
- **Thread**: A lightweight unit of execution within a process.
- Multithreading: Running multiple threads concurrently to improve responsiveness and efficiency.
- In Python, threads share the same memory space, making communication easier but requiring careful synchronization.

In [194]:
import threading

def test(id) :
    print("this is my test id : %d " % id) # In run time,%d will be replaced by id

In [196]:
test(20)

this is my test id : 20 


In [198]:
test(23)

this is my test id : 23 


In [200]:
test(3)

this is my test id : 3 


In [202]:
thred = [threading.Thread( target = test, args = (i,)) for i in [12,23,34]]

In [204]:
thred

[<Thread(Thread-5 (test), initial)>,
 <Thread(Thread-6 (test), initial)>,
 <Thread(Thread-7 (test), initial)>]

In [206]:
for p in thred:
    p.start()

this is my test id : 12 
this is my test id : 23 
this is my test id : 34 


In [210]:
# Task : We have 3 links and under each of these links we have a some data. We need to fetch data from each of these links
# We will use an external library to fetch data from diff links

import urllib.request

def file_download(url, filename):
    urllib.request.urlretrieve(url, filename) # we are capturing source (url) & destination (filename) to store the data

In [218]:
# It will download all the data from the url and store it in the file specified

file_download('https://raw.githubusercontent.com/itsfoss/text-files/master/agatha.txt' , "Multitread_test.txt")

In [220]:
# Instead of calling all the urls one by one, we want to create a thread to improve efficiency
url_list = ['https://raw.githubusercontent.com/itsfoss/text-files/master/agatha.txt' , 'https://raw.githubusercontent.com/itsfoss/text-files/master/sherlock.txt' ,'https://raw.githubusercontent.com/itsfoss/text-files/master/sample_log_file.txt' ]

In [222]:
url_list

['https://raw.githubusercontent.com/itsfoss/text-files/master/agatha.txt',
 'https://raw.githubusercontent.com/itsfoss/text-files/master/sherlock.txt',
 'https://raw.githubusercontent.com/itsfoss/text-files/master/sample_log_file.txt']

In [224]:
# My URL List is ready and now we are creating a variable where we will store the urls
data_file_list = ['data1.txt', 'data2.txt','data3.txt']

In [236]:
# We will now call on the threading concept
# Calling on function(file_download)which needs to be executed multiple times

thread1 = [threading.Thread(target=file_download , # function each thread will run
                            args=(url_list[i],# arguments passed to that function.Each thread downloads one file from url_list & saves to data_file_list
                            data_file_list[i])) for i in range(len(url_list))] # - List comprehension → builds a list of threads, one per URL.

In [238]:
# Right now, you’ve only created the threads. To actually run them we need start

for t in thread1:
    t.start()

In [240]:
thread1

[<Thread(Thread-14 (file_download), stopped 141788)>,
 <Thread(Thread-15 (file_download), stopped 132160)>,
 <Thread(Thread-16 (file_download), stopped 142484)>]

In [242]:
# Defining a function test2(x) that prints a message 10 times with a 1-second delay between each print
import time

def test2(x) : 
    for i in range(10) : 
        print(" test1 print the value of x %d and print the value of i %d " %(x,i))
        time.sleep(1) # It will pause and execute one by one ...slowly

In [244]:
test2(5)

 test1 print the value of x 5 and print the value of i 0 
 test1 print the value of x 5 and print the value of i 1 
 test1 print the value of x 5 and print the value of i 2 
 test1 print the value of x 5 and print the value of i 3 
 test1 print the value of x 5 and print the value of i 4 
 test1 print the value of x 5 and print the value of i 5 
 test1 print the value of x 5 and print the value of i 6 
 test1 print the value of x 5 and print the value of i 7 
 test1 print the value of x 5 and print the value of i 8 
 test1 print the value of x 5 and print the value of i 9 


In [250]:
thread2 = [threading.Thread(target=test2 ,
                            args=(i,)) for i in [100 ,10,20,5]] # 4 threads [100 ,10,20,5]

In [252]:
for t in thread2:
    t.start()
    
# Over here its taking first input and then executing it, taking a break, during that period, 2nd thread input is taken &
# excuted, thats why you see its going thru all the 4 inputs to print 0 and its not finishing everything with first input
# Its using multi thread while one is going for sleep

 test1 print the value of x 100 and print the value of i 0 
 test1 print the value of x 10 and print the value of i 0 
 test1 print the value of x 20 and print the value of i 0 
 test1 print the value of x 5 and print the value of i 0 


In [256]:
# Above code didnt work, restructuring with notes
import time
import threading

# Define a function that simulates a task (e.g., downloading, processing)
# It takes a parameter x and prints its value along with a loop counter
def test2(x):
    for i in range(10):
        # Print the current values of x and i
        print("test2 print the value of x %d and print the value of i %d" % (x, i))
        # Pause for 1 second to simulate a time-consuming task
        time.sleep(1)

# Create a list of Thread objects using list comprehension
# Each thread will run the test2 function with a different x value
thread2 = [
    threading.Thread(target=test2, args=(i,))
    for i in [100, 10, 20, 5]
]

# Start all threads — they begin executing test2 concurrently
for t in thread2:
    t.start()

# Wait for all threads to complete before moving on
# This ensures the main program doesn't exit early
for t in thread2:
    t.join()

test2 print the value of x 100 and print the value of i 0
test2 print the value of x 10 and print the value of i 0
test2 print the value of x 20 and print the value of i 0
test2 print the value of x 5 and print the value of i 0
test2 print the value of x 100 and print the value of i 1
test2 print the value of x 5 and print the value of i 1
test2 print the value of x 20 and print the value of i 1
test2 print the value of x 10 and print the value of i 1
test2 print the value of x 100 and print the value of i 2
test2 print the value of x 10 and print the value of i 2
test2 print the value of x 20 and print the value of i 2
test2 print the value of x 5 and print the value of i 2
test2 print the value of x 100 and print the value of i 3
test2 print the value of x 10 and print the value of i 3
test2 print the value of x 5 and print the value of i 3
test2 print the value of x 20 and print the value of i 3
test2 print the value of x 100 and print the value of i 4
test2 print the value of x 20 

In [260]:
# Initialize a shared variable that all threads will try to update
shared_var = 0

# Create a Lock object to ensure only one thread modifies shared_var at a time
lock_var = threading.Lock() # ensures only one thread can access a critical section of code at a time

# Define a function that each thread will execute
def test4(x):
    # Declare shared_var as global so we can modify it inside the function
    global shared_var

    # Acquire the lock before modifying shared_var
    with lock_var:
        # Safely increment the shared variable
        shared_var = shared_var + 1

        # Print the current thread's input (x) and the updated shared_var
        print("value of x %d and value of shareed_var %d " % (x, shared_var))

        # Simulate a delay while holding the lock (not ideal in real-world code)
        time.sleep(1)

# Create a list of threads, each calling test4 with a different argument
thread5 = [
    threading.Thread(target=test4, args=(i,))
    for i in [1, 2, 3, 4, 4, 5]  # Note: 4 appears twice
]

# Start all threads — they begin executing test4 concurrently
for t in thread5:
    t.start()

# Wait for all threads to complete ,else only the first thread will run, and the others wont execute or complete

for t in thread5:
    t.join()


value of x 1 and value of shareed_var 1 
value of x 2 and value of shareed_var 2 
value of x 3 and value of shareed_var 3 
value of x 4 and value of shareed_var 4 
value of x 4 and value of shareed_var 5 
value of x 5 and value of shareed_var 6 


In [264]:
test4(3)

value of x 3 and value of shareed_var 7 


In [266]:
test4(5)

value of x 5 and value of shareed_var 8 


What This Code Demonstrates
- Race condition prevention: Without the **lock_var**, multiple threads could try to update **shared_var** at the same time, leading to incorrect results.
- Thread-safe increment: The **with lock_var**: block ensures that only one thread at a time can execute the code that modifies **shared_var**.
- Thread interleaving: Even though threads run concurrently, the lock ensures that updates to **shared_var** happen one at a time.

#### Multiprocessing

Multiprocessing in Python is a way to run multiple processes in parallel, allowing programs to fully utilize multiple CPU cores and bypass the Global Interpreter Lock (GIL). It’s especially useful for CPU-bound tasks like heavy computations, simulations, or data processing.


##### Key Concepts of Multiprocessing
- Process-based parallelism: Unlike threads, each process has its own Python interpreter and memory space, so they don’t interfere with each other.
- Bypasses the GIL: Python’s GIL limits threads to one at a time, but multiprocessing uses separate processes, enabling true parallel execution.
- Cross-platform: Works on both Windows and Unix systems.
- Local & remote concurrency: Can run processes on the same machine or across multiple machines.

##### Core Features
- Process class: Lets you create and manage individual processes.
- Pool class: Provides a convenient way to parallelize tasks across multiple inputs (data parallelism).
- Shared data: Supports inter-process communication (IPC) via Queue, Pipe, and shared memory objects.
- Synchronization: Tools like Lock, Semaphore, and Event help coordinate processes.

##### Multiprocessing vs. Multithreading in Python

| Feature              | Multiprocessing                     | Multithreading                  |
|----------------------|-------------------------------------|---------------------------------|
| **Parallel execution** | True parallelism (multiple CPUs)    | Limited by GIL (pseudo-parallel)|
| **Memory space**       | Separate per process                | Shared among threads            |
| **Best for**           | CPU-bound tasks                     | I/O-bound tasks                  |
| **Overhead**           | Higher (process creation)           | Lower (thread creation)          |
| **Python GIL impact**  | Bypasses GIL                        | Restricted by GIL                |

************** Codes going in loop because of environment not code.Multiprocessing in Python is notoriously tricky inside Jupyter notebooks on Windows, because of how processes are spawned***************


#### Regular Expressions

- A regular expression is a sequence of characters that defines a search pattern.
- They’re used for matching, searching, extracting, and replacing text.
- Think of them as a “mini-language” for describing text patternfo)..


**Common Uses**
- Validation: Check if an email, phone number, or password is valid.
- Search: Find specific words, numbers, or patterns in text.
- Extract: Pull out useful data (like dates, hashtags, or URLs).
- Replace: Clean or reformat text (e.g., remove extra spaces, mask sensitive info).

##### Core Regular Expression (Regex) Syntax

| Pattern | Meaning | Example |
|---------|---------|---------|
| `.`     | Any character except newline | `a.c` → matches `abc`, `axc` |
| `^`     | Start of string | `^Hello` → matches `"Hello world"` |
| `$`     | End of string | `world$` → matches `"Hello world"` |
| `*`     | Zero or more repetitions | `ab*` → matches `a`, `ab`, `abb` |
| `+`     | One or more repetitions | `ab+` → matches `ab`, `abb` |
| `?`     | Zero or one repetition | `colou?r` → matches `color`, `colour` |
| `{n}`   | Exactly n repetitions | `\d{3}` → matches `123` |
| `{n,m}` | Between n and m repetitions | `\d{2,4}` → matches `12`, `1234` |
| `[]`    | Character set | `[aeiou]` → matches any vowel |
| `|`     | OR operator | `cat|dog` → matches `cat` or `dog` |
| `()`    | Grouping | `(ab)+` → matches `ab`, `abab` |

---

##### Special Character Classes (Shorthand)

| Pattern | Meaning | Example |
|---------|---------|---------|
| `\d`    | Digit (0–9) | `\d\d` → matches `42` |
| `\w`    | Word character (letters, digits, underscore) | `\w+` → matches `hello123` |
| `\s`    | Whitespace (space, tab, newline) | `\s` → matches `" "` |
| `\D`    | Non-digit | `\D+` → matches `abc!` |
| `\W`    | Non-word character | `\W` → matches `@`, `-`, `!` |
| `\S`    | Non-whitespace | `\S+` → matches `hello` |

In [3]:
# Regular expressions(regex) allow us to search for general patterns in text data (text mining)
# Python comes with built in library for re. The re Lib allows us to create specialised pattern strings & search for matches within text
# e.g of phone number (555)-555-5555  , Regex Pattern r"(\d\d\d)-(d\d\d)-(d\d\d\d)"
# Typical text mining will search for exact word and return True or False

text = " The agent's phone number is  123-234-4567. Call Soon!! "

In [5]:
# Text search will return true on exact match
'phone'in text

True

In [7]:
# Uploading "re function"
import re

In [9]:
# create search text
pattern = 'phone'

In [11]:
# Enter Search Var and source data
re.search(pattern,text)
# It will confirm the match and higlight index location , e.g it starts at Index Location 13 and end by 18

<re.Match object; span=(13, 18), match='phone'>

In [13]:
# now enter some other text which is not available in source text
pattern = 'wrong keyword'

In [15]:
# Since there is no match, it will simply excute without returning any result
re.search(pattern,text)

In [19]:
pattern2 = '123-234-4567'
re.search(pattern2,text)

<re.Match object; span=(30, 42), match='123-234-4567'>

In [21]:
# Assigning it to variable 
mymatch=re.search(pattern2,text)
mymatch

<re.Match object; span=(30, 42), match='123-234-4567'>

In [23]:
# You can also report the location using span
mymatch.span()

(30, 42)

In [25]:
# ask for the starting index value
mymatch.start()

30

In [27]:
# ask for the end index value
mymatch.end()

42

In [29]:
# re.search will find and return only the first match. What about cases with multiple or group matches. we have few more fns
nexttext= "Hi Prisha, where is Gemma? Why are you quite Prisha? Answer me Prisha !!"

In [31]:
# Testing with re.search to look for name
Singlematch=re.search('Prisha',nexttext)

In [33]:
# Returns only first occurence
Singlematch

<re.Match object; span=(3, 9), match='Prisha'>

In [35]:
# Find multiple match 
groupmatch = re.findall('Prisha',nexttext)
groupmatch

['Prisha', 'Prisha', 'Prisha']

In [37]:
# to find the occurence
len(groupmatch)

3

In [39]:
# We can include it in for loop and get all the positions, using finditer to iterate thru all text and report text match occurences

for match in re.finditer ('Prisha',nexttext):
    print(match)

<re.Match object; span=(3, 9), match='Prisha'>
<re.Match object; span=(45, 51), match='Prisha'>
<re.Match object; span=(63, 69), match='Prisha'>


In [41]:
# for displaying only the location
for match in re.finditer ('Prisha',nexttext):
    print(match.span())

(3, 9)
(45, 51)
(63, 69)


In [43]:
# for displaying just the text 
for match in re.finditer ('Prisha',nexttext):
    print(match.group())

Prisha
Prisha
Prisha


#### Character Classes (Identifiers)

In [48]:
text = 'My phone number is 123-234-3456'

In [50]:
# when working with strings "\" are indicated as special escape characters such as \m - new line  \t - tab
# we will include "r" to tell python that we are using a pattern for regular expression & not to mistake it for escape char
phone = re.search(r'\d\d\d-\d\d\d-\d\d\d\d',text)
phone

<re.Match object; span=(19, 31), match='123-234-3456'>

In [52]:
# to grab the matched text itself
phone.group()

'123-234-3456'

In [56]:
# what if we had to include a 20+ digit search, instead of using "\d" 20+ times, we can use quantifiers
phone = re.search(r'\d{3}-\d{3}-\d{4}',text)
phone.group()

'123-234-3456'

In [58]:
# compile function helps group the text matches. which means you can call the groups seperately
# e.g we can just extract Area Code or index it to extract the specific group

phone_pattern = re.compile(r'(\d{3})-(\d{3})-(\d{4})')

In [60]:
results = re.search(phone_pattern,text)
results.group()

'123-234-3456'

In [72]:
# First index value is 1 & not 0
results.group(1)

'123'

In [76]:
# Using "|" as OR operator

temp=re.search(r'cat|dog', "Thats my dog ")
temp.group()

'dog'

In [78]:
# Using wild character as prefix 
re.findall(r'at','The cat Sat on mat')

['at', 'at', 'at']

In [80]:
# Need to be careful when using too many wild characters since they also include spaces
re.findall(r'...at','The cat Sat on matwent splat')

['e cat', 'n mat', 'splat']

In [82]:
# if you are looking for a word..starting with number (Start with)
re.findall(r'^\d','1 is a number')

['1']

In [84]:
# Ends with ...
re.findall(r'\d$','The number is 2')

['2']