# Unveiling Trends in Renewable Energy 🌍🔋
### The Data Scientist Master

## 📖 Background
The race to net-zero emissions is heating up. As nations work to combat climate change and meet rising energy demands, renewable energy has emerged as a cornerstone of the clean transition. Solar, wind, and hydro are revolutionizing how we power our lives. Some countries are leading the charge, while others are falling behind. But which nations are making the biggest impact? What’s driving their success? And what lessons can we learn to accelerate green energy transition?

As a data scientist at NextEra Energy, one of the world’s leading renewable energy providers, your role is to move beyond exploration, into prediction. Using a rich, real-world dataset, you’ll build models to forecast renewable energy production, drawing on indicators like GDP, population, carbon emissions, and policy metrics.

With the world watching, your model could help shape smarter investments, forward-thinking policies, and a faster transition to clean energy. 🔮⚡🌱

## 💾 The data 
Your team has gathered a **global renewable energy dataset** ("Training_set_augumented.csv") covering energy production, investments, policies, and economic factors shaping renewable adoption worldwide:  

## 🌍 Basic Identifiers  
- **`Country`** – Country name  
- **`Year`** – Calendar year (YYYY)  
- **`Energy Type`** – Type of renewable energy (e.g., Solar, Wind)  

#### ⚡ Energy Metrics  
- **`Production (GWh)`** – Renewable energy produced (Gigawatt-hours)  
- **`Installed Capacity (MW)`** – Installed renewable capacity (Megawatts)  
- **`Investments (USD)`** – Total investment in renewables (US Dollars)  
- **`Energy Consumption (GWh)`** – Total national energy use  
- **`Energy Storage Capacity (MWh)`** – Capacity of energy storage systems  
- **`Grid Integration Capability (Index)`** – Scale of 0–1; ability to handle renewables in grid  
- **`Electricity Prices (USD/kWh)`** – Average cost of electricity  
- **`Energy Subsidies (USD)`** – Government subsidies for energy sector  
- **`Proportion of Energy from Renewables (%)`** – Share of renewables in total energy mix  

#### 🧠 Innovation & Tech  
- **`R&D Expenditure (USD)`** – R&D spending on renewables  
- **`Renewable Energy Patents`** – Number of patents filed  
- **`Innovation Index (Index)`** – Global innovation score (0–100)  

#### 💰 Economy & Policy  
- **`GDP (USD)`** – Gross domestic product  
- **`Population`** – Total population  
- **`Government Policies`** – Number of policies supporting renewables  
- **`Renewable Energy Targets`** – Whether national targets are in place (1 = Yes, 0 = No)  
- **`Public-Private Partnerships in Energy`** – Number of active collaborations  
- **`Energy Market Liberalization (Index)`** – Scale of 0–1  

#### 🧑‍🤝‍🧑 Social & Governance  
- **`Ease of Doing Business (Score)`** – World Bank index (0–100)  
- **`Regulatory Quality`** – Governance score (-2.5 to 2.5)  
- **`Political Stability`** – Governance score (-2.5 to 2.5)  
- **`Control of Corruption`** – Governance score (-2.5 to 2.5)  

#### 🌿 Environment & Resources  
- **`CO2 Emissions (MtCO2)`** – Emissions in million metric tons  
- **`Average Annual Temperature (°C)`** – Country’s avg. temp  
- **`Solar Irradiance (kWh/m²/day)`** – Solar energy availability  
- **`Wind Speed (m/s)`** – Average wind speed  
- **`Hydro Potential (Index)`** – Relative hydropower capability (0–1)  
- **`Biomass Availability (Tons/year)`** – Total available biomass  

## 💪 Challenge

As a data scientist at NextEra Energy, your task is to use the **Training Set** (80% of the data) to train a powerful machine learning model that can predict **renewable energy production (GWh)**. Once your model is trained, you will use it to generate predictions for the **Test Set**, which does not include the target (`Production (GWh)`) but has an additional **`ID` column**.

### 🚀 Your Task:

1. **Train Your Model**:

   * Use the **Training Set**, which contains all features and the target (`Production (GWh)`), to build and fine-tune your model.
   * Explore, clean, and transform the data as needed.

2. **Generate Predictions**:

   * Use your trained model to make predictions for the **Test Set (20%)**, which has all the features **except `Production (GWh)`**.
   * The Test Set also has an **`ID` column**, which uniquely identifies each row.

3. **Submit Your Results**:

   * Save your predictions as a **CSV file** with exactly **two columns**:

     * **`ID`**: Directly from the Test Set (must match exactly).
     * **`Predicted Production (GWh)`**: Your model’s predictions for each row.

### 🌐 Ready to Start?

* Download the **Training Set** and **Test Set**.
* Build, train, and test your model.
* Submit your predictions. 🚀

🔎 Your model won’t just generate predictions — it will uncover underlying drivers of renewable energy production and reveal where the biggest gains can be made!


## 🧑‍⚖️ Judging Criteria

Your submission will be evaluated using a **hybrid system**, combining **Model Accuracy (80%)** and **Community Votes (20%)**.

### 📊 **1. Model Accuracy (80%)**

* Your submission will be scored using **Root Mean Squared Error (RMSE)**, which measures how close your predictions are to the actual values in our **hidden test set**.
* The lower your RMSE, the better your model’s performance.

#### ✅ Submission Instructions:

* First, submit your Datalab workbook.
* Then, submit your predictions as a **.csv file** via this [Google Form](https://docs.google.com/forms/d/e/1FAIpQLScNiG0DakDEk39wLGtgsr1aAt-9Lm4-86w4tulaKM_w75Eodw/viewform?usp=sharing).
* Your file must contain **exactly two columns**:

  * **`ID`**: Directly from the Test Set (must match exactly).
  * **`Predicted Production (GWh)`**: Your model’s predictions for each row.

#### ✅ Submission Example:

| ID  | Predicted Production (GWh) |
| --- | -------------------------- |
| 1   | 50200.34                   |
| 2   | 67820.78                   |
| 3   | 45210.55                   |
| ... | ...                        |

#### ✅ Important:

* Use the **same email address** for the Google Form as the one associated with your DataCamp account. This is how we will link your submission to your Datalab workbook.
* Only submissions in the correct format will be accepted and scored.
* We will automatically check for formatting errors (missing IDs, extra IDs, or invalid columns).

### ✍️ **2. Community Votes (20%)**

* Once the competition ends, you will be able to view the top submissions from other participants.
* Vote for the most insightful, creative, or well-explained solutions.


## ✅ Checklist before publishing
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- **Remove redundant cells** like the introduction to data science notebooks, so the workbook is focused on your story.
- Check that all the cells run without error.

## ⏳ Data is the new fuel - let’s generate insights and electrify the future!