## About this Part

Congratulations! You've reached the final Part of this Sprint.
This Part serves as an integrative experience, allowing you to apply the knowledge and skills acquired in this and previous Sprints.

As the culmination of this Sprint, you'll construct a system that routinely retrieves weather data and stores it in a Relational Database Management System (RDBMS).

Sprint projects often necessitate the use of skills, tools, and techniques not explicitly covered during the Sprint.
This is intentional, as true expertise stems from the ability to recognize the skills needed to solve a given problem and to acquire these skills as necessary.

Remember, perfection isn't expected at this stage.
You will continuously hone your skills and have ample opportunities to apply them in future projects.
For now, focus on leveraging what you've learned and giving it your best effort!

*Note:* [advice on building your portfolio](https://turingcollege.atlassian.net/wiki/spaces/DLG/pages/1002307695/Portfolio+Items)

## Context

You've recently joined a non-profit organization aimed at creating a weather simulator to better comprehend global warming effects.
Your team consists of scientists eager to validate the weather simulator's results, which they've recently finished developing.

The initial results require hourly weather data from the twenty largest cities in Europe, stored in a SQL database table.
The data will then be connected to the simulator.
The simulator predicts temperature (in Celsius) and generates weather conditions descriptions like rainy, clear, etc.
Both aspects need validation.

Because of the simulator's nature, acquiring the data promptly and maintaining all historical data is crucial.
Your solution should be scalable to accommodate an increasing number of cities worldwide in the future.

Following several refinement and planning meetings, you've agreed to construct a system that queries a weather API (such as OpenWeatherMap API) every hour.
The results (temperature and description) are to be stored in an RDBMS of your choice.
Given the simulator's latency importance, you decided to use a suitable concurrent method (threads, processes or co-routines) to fetch the data. You are expected to justify your choice to the wider engineering team.

The simulator also requires additional analytical information:

- Maximum, minimum, and standard deviation of temperatures per country and city for today, yesterday, current week, and last seven days.
- Indicate the cities with the highest or lowest temperature for each hour, day, and week.
- The number of times (hours) it rained in the last day and week.

For all scheduling tasks, you'll use cron, as more advanced orchestrators (e.g., Airflow, Dagster, Prefect, etc.) are not permitted by the SRE team.

Note: Depending on the weather API you use, you might not access historical data. If that's the case, add generated (fake) data to demonstrate your solution's functionality.

### Twenty largest cities in Europe

1. Istanbul, Turkey: 10,241,510
1. London, United Kingdom: 8,799,800
1. Saint Petersburg, Russia: 5,598,486
1. Berlin, Germany: 3,850,809
1. Madrid, Spain: 3,305,408
1. Kyiv, Ukraine: 2,952,301
1. Rome, Italy: 2,749,031
1. Bucharest, Romania: 2,161,347
1. Paris, France: 2,102,650​​
1. Minsk, Belarus: 1,996,553
1. Vienna, Austria: 1,982,442
1. Warsaw, Poland: 1,862,345
1. Hamburg, Germany: 1,853,935
1. Budapest, Hungary: 1,706,851
1. Belgrade, Serbia: 1,688,667
1. Barcelona, Spain: 1,636,732
1. Munich, Germany: 1,487,708
1. Kharkiv, Ukraine: 1,421,225
1. Milan, Italy: 1,349,930

Source: Wikipedia

## Objectives for this Part

- Practice scheduling with cron.
- Gain hands-on experience with Python threads, coroutines, and processes.
- Practice utilizing SQL views and analytical functions.
- Practice RDBMS administration.

## Requirements

- Your solution should encompass the functionality outlined in the Context section.
- Offer insights on how your analysis could be improved.

Note: You might have noticed that the requirements are less specific this time. This is intentional.

## Bonus

- Benchmark at least three concurrent methods for data retrieval (coroutines, threads, and processes) and compare their performance. Ensure, it is easy to switch between these methods for the weather data retrieval.
- Implement regular backups for the database, with backups made every hour, storing the last twenty-four backups.
- Implement logging for your system to be able to get the information whilst the task is running and debug, in case issues happen. For that you could use Python's `logging` library.
- Implement a monitoring solution that visualizes the metrics that represent the health of your database.

## Evaluation Criteria

- Adherence to the requirements. How well did you meet the requirements?
- Code quality. Was your code well-structured? Did you use the appropriate levels of abstraction? Did you remove commented-out and unused code?
- System design. Did your solution use suitable technologies, tools, software architecture, and algorithms?
- Presentation quality. How comprehensive is your presentation, and how well are you able to explain your solution to the target audience?
- Conceptual understanding. How well do you know the concepts covered in this and previous Sprints?

## Project Review

During your project presentation, assume that your audience is a data scientist involved in building the machine learning model for your team. They are expected to have strong data science skills and decent software engineering knowledge - they will understand technical jargon but may not notice areas for improvement or question your decisions. Since they are familiar with the problem, avoid explaining trivial concepts or simple code snippets; focus on technological and design choices, as well as the end-user functionality of your solution.

During the project review, you might be asked questions to test your understanding of the covered topics, such as:

- What is the purpose of cron, and how are cron jobs written?
- What types of joins are available in SQL, and how do they differ?
- What are the most common functions for working with dates in SQL?
- What is a coroutine, what are its uses, and why do coroutines need to be primed?
- Is the use of try-except-else considered good practice in Python?

IMPORTANT: during the project review, you will also be asked to solve an exercise using Python.


## General Project Review Guidelines

For an in-depth explanation about how project reviews work at Turing College, please read [this doc](https://turingcollege.atlassian.net/wiki/spaces/DLG/pages/537395951/Peer+expert+reviews+corrections).
