This repository contains a collection of beginner-friendly Python scripts based on the Python for Everybody (PY4E) course. The programs demonstrate how to read files, parse text, and use lists, dictionaries, and string methods to extract useful information.
All programs work with sample text files (mbox-short.txt and romeo.txt), which can be downloaded from:
- Reads
mbox-short.txtline by line. - Finds lines starting with
"From ". - Prints the sender’s email address (second word).
- Prints a final count of such lines.
- Reads
mbox-short.txt. - Extracts the hour from the timestamp in each
"From "line. - Builds a histogram of how many messages were sent during each hour.
- Prints the distribution sorted by hour.
-
Finds a line like:
X-DSPAM-Confidence: 0.8475 -
Uses
find()and slicing to extract the number. -
Converts it to a float and prints it.
- Reads
mbox-short.txt. - Builds a dictionary mapping sender emails to their message counts.
- Finds and prints the email address with the highest count.
- Reads
romeo.txtline by line. - Splits lines into words and builds a list of unique words.
- Sorts the list alphabetically using Python’s
sort()method. - Prints the sorted list.
-
Prompts for a file name (e.g.,
mbox-short.txt). -
Reads the file and extracts floating-point values from lines starting with:
X-DSPAM-Confidence: -
Computes and prints the average value (without using
sum()or a variable namedsum).
-
Clone this repository:
git clone https://github.com/shaazejahan/python-file-parsing.git cd python-file-parsing -
Download the sample files from the links above and place them in the same directory as the Python scripts.
-
Run any program with:
python script_name.py
- File handling (
open,readline, loops) - String manipulation (
split(),find(), slicing) - Lists and dictionaries
- Counting and histogramming
- Simple algorithms (max loop, average calculation, sorting)
👩💻 Shaaz E Jahan