This repository provides evidence of Python and pandas data analysis skills developed through practice exercises. The notebooks showcase core and intermediate Python programming concepts as well as data manipulation and analysis using pandas.
Python pandas Data manipulation Matplotlib Seaborn Data cleaning Markdown Jupyter Notebook Google Colab Data visualisation
- Language: Python
- Libraries: pandas
- Environment: Jupyter Notebook, Google Colab
-
01_python_fundamentals.ipynb
- Covers core Python concepts used in data analysis, including variables, data types, user input/output, conditional statements, arithmetic operations and basic number manipulation.
- Worksheet
-
02_python_intermediate.ipynb
- Covers intermediate Python concepts, including loops, nested loops, error checking, number manipulation, creation of mini-programs (calculator, factorial and prime number checkers, and pattern printing).
- Worksheet
-
02_pandas_dataframes
A collection of pandas exercise notebooks focused on data manipulation and analysis. -
02a_pandas_basics.ipynb
- Core pandas functionality, including reading Excel files, exploring and inspecting DataFrames, understanding dataset structure and summary statistics, selecting and slicing data, using
locandiloc, Boolean filtering, sorting values and calculating summary statistics. - Worksheet
- Core pandas functionality, including reading Excel files, exploring and inspecting DataFrames, understanding dataset structure and summary statistics, selecting and slicing data, using
-
02b_pandas_dataframes_exercises_part1.ipynb
- Covers practical exercises with pandas DataFrames, including creating DataFrames from lists and dictionaries, inspecting data, renaming and adding columns, performing arithmetic operations, converting data types, calculating total revenue, rounding numeric columns, and exporting cleaned data to CSV and Excel.
- Worksheet
-
02c_pandas_dataframes_exercises_part2.ipynb
- Focuses on practical DataFrame manipulation and analysis, including importing CSV files, creating new calculated columns, Boolean filtering and aggregation, calculating percentages, and sorting.
- Worksheet
-
02d_pandas_missing_data.ipynb
- Covers handling missing data in pandas, including reading CSV files, counting missing values by column, calculating the proportion of missing values by row, and sorting to identify mostly empty rows. Exercises use the penguins dataset to practice detecting and understanding missing data, reinforcing data cleaning and exploration skills.
- Worksheet
Completing these exercises helped to:
- Consolidate understanding of Python programming and pandas data manipulation.
- Build confidence in writing clean, reproducible code.
- Gain experience exploring and transforming datasets.