Welcome! This repository contains a collection of assignments, projects, and code snippets from the "Introduction to Data Science in Python" course. The goal is to provide a practical, hands-on portfolio of work demonstrating core data science concepts and Python programming skills.
This repository covers a range of fundamental topics essential for any aspiring data scientist. The projects focus on data manipulation, analysis, database interaction, and algorithmic thinking.
- Data Manipulation: Using
pandas
andnumpy
for cleaning, transforming, and analyzing datasets. - Text Processing: Leveraging regular expressions (
re
) for parsing and extracting information from unstructured text. - Database Interaction: Using
sqlite3
to create, populate, and query relational databases from Python. - Algorithmic Problem Solving: Applying logic to solve challenges and work with external modules.
- Data Ingestion: Reading and parsing data from various formats like JSON.
Here are some of the key assignments you can find in this repository:
- Description: A set of functions that use regular expressions to parse names from a string, extract students with a specific grade from a text file, and convert a web log file into a structured list of dictionaries.
- Concepts: Regular Expressions (
re
), file I/O, list comprehensions, string manipulation.
- Description: A Python script that reads student roster data from a JSON file, creates a normalized database schema (
User
,Course
,Member
), and populates the tables. This demonstrates a complete data ingestion and storage pipeline. - Concepts:
sqlite3
for database management,json
for data parsing, relational database design (many-to-many relationships), SQLJOIN
s.
- Description: An intelligent solver for a Wordle-like game. The script reads the game state from an image, filters a word list based on game rules and feedback, and makes a strategic next guess.
- Concepts: Algorithmic thinking, working with external libraries (
PIL
), data filtering, and robust programming with fallback logic.
Make sure you have Python 3 installed on your system.
python --version
-
Clone the repository to your local machine:
git clone https://github.com/your-username/Introduction-to-Data-Science-in-Python.git cd Introduction-to-Data-Science-in-Python
-
It's highly recommended to use a virtual environment to manage dependencies for each project.
# Create a virtual environment python -m venv venv # Activate it (on Windows) .\venv\Scripts\activate # Activate it (on macOS/Linux) source venv/bin/activate
-
Install the required packages for a specific project. Some projects may have a
requirements.txt
file.pip install -r path/to/project/requirements.txt
To run any script, navigate to its directory and execute it with Python. For example, to run the database roster script:
python roster.py
Make sure any required data files (e.g., roster_data.json
) are in the same directory as the script.
While this is primarily a portfolio of course assignments, suggestions for improvements or alternative solutions are welcome. Feel free to fork the repository and submit a pull request with your ideas!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/ImprovedSolution
) - Commit your Changes (
git commit -m 'Add an improved solution for X'
) - Push to the Branch (
git push origin feature/ImprovedSolution
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE.md
file for details.