# Introduction - Prediction of solar power production in Suvilahti PV plant
Description: Example project of predicting solar power production in Suvilahti for the upcoming 24-36 hours. The model is trained on weather observation values from FMI, used as features, while labels are taken from Helen PV production data. Inputs for model usage are weather forecast data from FMI API.

### A list of sequential notebooks:
Current stage marked with bold.

* Data gathering
* Data analysis
* Feature engineering
* Feature selection
* __Model building <-__
* Model deployment

# Checklist

>Source: Checklist from __Hands on ML with Scikit-Learn and Tensorflow__ by *Aurelien Geron*

Checklist guides you through the project. There are 8 main steps with consequential notebooks:  

- [x] Frame the problem and look at the big picture. -> __Below__.
- [x] Get the data.  -> __data_gathering__ notebook
- [x] Explore the data to gain insights.  -> __data_analysis__ notebook
- [ ] Prepare the data to better expose the underlying data patterns to ML algorithms. -> __feature_engineering__ and __feature_selection__ notebooks  
- [x] Explore many different models and short-list the best ones. -> __model_building__ notebook  
- [ ] Fine-tune your models and combine them into a great solution.  
- [ ] Present your solution.  
- [ ] Launch, monitor, and maintain your system. -> __model_deployment__ notebook

# 1. Frame the problem and look at the big picture  
1. Define the objective (in business terms).  

> The objective is to utilize the weather forecast data in the prediction of the PV energy consumption. Users can explore the expected profits from the generation.

2. How will your solution be used? 

>Users go to a web app or use its API to GET a prediction about the PV energy consumption for the next day/hours. First, the app gets a weather forecast from FMI API for either __Suvilahti__ or the closest meteo station - __Kumpula__. Then, the weather forecast values are used as the predictors for the ML model on energy consumption.
 
3. What are the current solutions/workarounds (if any)?  

> * The FMI (Finnish Meteorological Institute) has an available API that lets you get weather forecast predictions in a structured format. The FMI forecast has a multitude of weather related parameters. 
> * Regarding training the model on energy consumption, the website of Helen Energia, dedicated to PV panels in Suvilahti and Messukeskus, has a web framework that lets you download up to 4 years of time series data, structured the following way: timestamp, energy consumption over the over in kWh. The data is in hourly resolution.  
> * The FMI website also allows to download the hourly weather historical data, which makes it a perfect candidate for our mini-project.

4. How should you frame this problem (supervised/unsupervised, online/offline, etc.)  

>Supervised regression prediction model-based learning with data coming in batches.

5. How should performance be measured? 

>The performance is measured by a Root Mean Squared Error (RMSE) (or MSE?) as it is more sensitive to outliers than MAE, for example.
 
6. Is the performance measure aligned with the business objective?

> is it?

7. What would be the minimum performance needed to reach the business objective?

>We select accuracy of 90-95% on the test set as a benchmark (really?)
  
8. What are comparable problems? Can you reuse experience or tools?  

>There are several research articles on the topic. However, since this is a portfolio project targeted at getting a job, we will not be using any specific python libraries that can provide accurate values for some of the variables, or even the prediction of PV power itself.
 
9. Is human expertise available?  

> There are ways to model this without a ML algorithm, however, not in the scope of this project. 

10. How would you solve the problem manually?  

> The weather data has correlation with the solar irradiance that is received on the Earth. Production of PV energy is proportional to solar irradiance. 

> The model built in this project is designed to predict the produced PV energy. We utilize the  solar radiation, depending on the position of the sun in that day of the year and hour of the day, while combining them with weather forecasted data, such as cloud coverage, humidity, pressure, air temperature, etc.

11. List the assumptions you or others have made so far.  

>* The weather forecast that the model uses in the production is the closest information to real weather values for that day.
>* We recieve weather observation data for Training from Kumpula, while the place we are trying to predict for is in Suvilahti, which are about 2.5-3 km away from each other.

12. Verify assumptions if possible.  

> There is no better source of information for the upcoming days than the weather prediction.


  
# Present your solution  
1. Document what you have done.  
2. Create a nice presentation.  
    - Make sure you highlight the big picture first.  
3. Explain why your solution achieves the business objective.  
4. Don't forget to present interesting points you noticed along the way.  
    - Describe what worked and what did not.  
    - List your assumptions and your system's limitations.  
5. Ensure your key findings are communicated through beautiful visualizations or easy-to-remember statements (e.g., "the median income is the number-one predictor of housing prices").  
