Skip to content

pelusok/data_wrangling

Repository files navigation

Unit 5: Data Wrangling

Before perfoming analysis and running algorithms it is incredibly important to collect, clean, and transform imperfect data into usable datasets.

Learning Objectives

  • Become proficient at data manipulation using pandas and other Python packages as needed.
  • Work with mising or invalid values.
  • Extract and manipulate data in formats such as XML and JSON.
  • Work with SQL based databases and write basic SQL queries up to basic aggregations and joins

Projects

SQL Case Study:

The SQL Case Study provides hands on SQL experience with a database that contains three different tables. SQL query techniques used include:

  • Table joins

  • Aggregation

  • Creation of new columns

  • Code

JSON Mini Project:

The JSON Mini Project examines the World Bank dataset, derived from a school improvement project in Ethiopia. Three tasks are completed in this project:

  1. Identify the top 10 countries with the most projects
  2. Identify the top 10 project themes
  3. Find and replace missing project theme name values

API Mini Project:

Utilizing the website Quandl.com, request an API and pull data from the Frankfurt Stock Exchange. Analyze data without the use of third party packages such as Pandas.

  1. Collect the appropriate dataset
  2. Convert JSON object to a python dictionary
  3. Analyze the dataset looking at metrics such as the median trading volume throughout the year, and the largest price change in any one day

Releases

No releases published

Packages

No packages published