# SQL + Pandas

In this section we will cover a step-by-step example to integrate the energy production data stored in a database (SQLite) and then using Python to analyze and plot the data. This guide walks through each step, from loading CSV data into an SQLite database to querying and plotting the energy production data for different countries.

This sections includes downloading the data from individual `.csv` files, loading them into a SQLite database, and visualizing the energy production using Python.

This guide shows how to:
1. Download CSV files from Eurostat.
2. Load data into a SQLite database.
3. Use Python to query and visualize the data.

## Step 1: Download the Data from Eurostat

The datasets are available on the Eurostat website under [Energy Statistics](https://ec.europa.eu/eurostat/web/energy). Let’s assume the data files have been downloaded and stored in your `~/Downloads` folder as `.csv` files.

For this example, we are using the following datasets:

In [18]:
import numpy as np
import pandas as pd
import sqlite3
import os

In [8]:
euroStat_coal_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_coal.xlsx'
euroStat_nonRenewables_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_combustionFuels_nonRenewables.xlsx'
euroStat_renewables_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_combustionFuels_Renewables.xlsx'
euroStat_geothermal_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_geothermal.xlsx'
euroStat_hydro_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_hydro.xlsx'
euroStat_naturalGas_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_naturalGas.xlsx'
euroStat_nuclear_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_nuclear.xlsx'
euroStat_oil_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_oil.xlsx'
euro_otherRenewables_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_otherRenewables.xlsx'
euroStat_solar_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_solar.xlsx'
euroStat_wind_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_wind.xlsx'

## Step 2: Load Data into SQLite Database

We'll now read the downloaded CSV files, clean the data, and store them in an SQLite database.


### Python Code for Loading CSV Files into SQLite

In [9]:
# Load .xlsx file
euroStat_coal = pd.read_excel(euroStat_coal_filePath, sheet_name='coal', skiprows=range(0, 8))
euroStat_nonRenewables = pd.read_excel(euroStat_nonRenewables_filePath, sheet_name='nonRenewables', skiprows=range(0, 8))
euroStat_renewables = pd.read_excel(euroStat_renewables_filePath, sheet_name='renewables', skiprows=range(0, 8))
euroStat_geothermal = pd.read_excel(euroStat_geothermal_filePath, sheet_name='geothermal', skiprows=range(0, 8))
euroStat_hydro = pd.read_excel(euroStat_hydro_filePath, sheet_name='hydro', skiprows=range(0, 8))
euroStat_naturalGas = pd.read_excel(euroStat_naturalGas_filePath, sheet_name='naturalGas', skiprows=range(0, 8))
euroStat_nuclear = pd.read_excel(euroStat_nuclear_filePath, sheet_name='nuclear', skiprows=range(0, 8))
euroStat_oil = pd.read_excel(euroStat_oil_filePath, sheet_name='oil', skiprows=range(0, 8))
euro_otherRenewables = pd.read_excel(euro_otherRenewables_filePath, sheet_name='otherRenewables', skiprows=range(0, 8))
euroStat_solar = pd.read_excel(euroStat_solar_filePath, sheet_name='solar', skiprows=range(0, 8))
euroStat_wind = pd.read_excel(euroStat_wind_filePath, sheet_name='wind', skiprows=range(0, 8))


In [10]:
euroStat_coal.head()

Unnamed: 0,TIME,2016-01,Unnamed: 2,2016-02,Unnamed: 4,2016-03,Unnamed: 6,2016-04,Unnamed: 8,2016-05,...,Unnamed: 196,2024-03,Unnamed: 198,2024-04,Unnamed: 200,2024-05,Unnamed: 202,2024-06,Unnamed: 204,2024-07
0,GEO (Labels),,,,,,,,,,...,,,,,,,,,,
1,European Union - 27 countries (from 2020),:,,:,,:,,:,,:,...,,23110.523,,:,,:,,:,,:
2,Belgium,:,,:,,:,,:,,:,...,,160.492,,168.435,,179.131,,:,,:
3,Bulgaria,:,,:,,:,,:,,:,...,,519.203,,360.31,,290.517,,:,,:
4,Czechia,:,,:,,:,,:,,:,...,,2159.708,,1550.964,,1480.491,,:,,:


In [11]:
# Connect to SQLite database (it will be created if it doesn't exist)
conn = sqlite3.connect(db_path)
cursor = conn.cursor()

In [12]:
# Add the euroStat_coal dataset to the database using SQL
euroStat_coal.to_sql('euroStat_coal', conn, if_exists='replace', index=False)

50

In [13]:

# Check if euroStat_coal table exists in the database
cursor.execute("SELECT name FROM sqlite_master WHERE type='table' AND name='euroStat_coal';")
table_exists = cursor.fetchone()

if table_exists:
    print("The 'euroStat_coal' table exists in the database.")
else:
    print("The 'euroStat_coal' table does not exist in the database.")

The 'euroStat_coal' table exists in the database.


In [14]:
# Query the first few rows from the euroStat_coal table
df_coal = pd.read_sql_query("SELECT * FROM euroStat_coal LIMIT 5;", conn)

# Display the data
print(df_coal)

                                        TIME 2016-01 Unnamed: 2 2016-02  \
0                               GEO (Labels)    None       None    None   
1  European Union - 27 countries (from 2020)       :       None       :   
2                                    Belgium       :       None       :   
3                                   Bulgaria       :       None       :   
4                                    Czechia       :       None       :   

  Unnamed: 4 2016-03 Unnamed: 6 2016-04 Unnamed: 8 2016-05  ... Unnamed: 196  \
0       None    None       None    None       None    None  ...         None   
1       None       :       None       :       None       :  ...         None   
2       None       :       None       :       None       :  ...         None   
3       None       :       None       :       None       :  ...         None   
4       None       :       None       :       None       :  ...         None   

     2024-03 Unnamed: 198   2024-04 Unnamed: 200   2024-05 Unnamed: 

Now, to remove rows 0 and 1 from the euroStat_coal table in your SQLite database, you can use the following steps:

* **Step 1**: Remove Row 0 and Row 1 from the `euroStat_coal` Table

Assuming you have an ID or unique identifier in the table, we can use DELETE SQL commands to remove rows based on their identifiers (e.g., row numbers). If there is no ID, we can modify the DataFrame in Pandas and replace the table in the database.

Since your description does not specify the unique column to identify rows 0 and 1, I’ll use the Pandas method to drop rows and re-insert the cleaned table.

In [15]:
# Drop the first two rows (index 0 and 1) from the dataframe
euroStat_coal_cleaned = euroStat_coal.drop([0, 1])

# Display the cleaned data
print(euroStat_coal_cleaned.head())

       TIME 2016-01  Unnamed: 2 2016-02  Unnamed: 4 2016-03  Unnamed: 6  \
2   Belgium       :         NaN       :         NaN       :         NaN   
3  Bulgaria       :         NaN       :         NaN       :         NaN   
4   Czechia       :         NaN       :         NaN       :         NaN   
5   Denmark       :         NaN       :         NaN       :         NaN   
6   Germany       :         NaN       :         NaN       :         NaN   

  2016-04  Unnamed: 8 2016-05  ...  Unnamed: 196   2024-03  Unnamed: 198  \
2       :         NaN       :  ...           NaN   160.492           NaN   
3       :         NaN       :  ...           NaN   519.203           NaN   
4       :         NaN       :  ...           NaN  2159.708           NaN   
5       :         NaN       :  ...           NaN   276.518           NaN   
6       :         NaN       :  ...             e  8758.996             e   

    2024-04  Unnamed: 200   2024-05  Unnamed: 202 2024-06  Unnamed: 204  \
2   168.435      

*  **Step 2**: Replace the euroStat_coal Table in the SQLite Database

After removing the first two rows, we can replace the existing euroStat_coal table with the cleaned DataFrame.

In [16]:
# Replace the table in the SQLite database with the cleaned data
euroStat_coal_cleaned.to_sql('euroStat_coal', conn, if_exists='replace', index=False)

# Commit the changes
conn.commit()

* **Step 3**: Query the First Few Rows from the euroStat_coal Table and Display the Data

Now that the first two rows have been removed, you can query the table and display the first few rows to confirm.

In [17]:
# Query the first few rows from the cleaned euroStat_coal table
df_coal_cleaned = pd.read_sql_query("SELECT * FROM euroStat_coal LIMIT 5;", conn)

# Display the data
print(df_coal_cleaned)

       TIME 2016-01 Unnamed: 2 2016-02 Unnamed: 4 2016-03 Unnamed: 6 2016-04  \
0   Belgium       :       None       :       None       :       None       :   
1  Bulgaria       :       None       :       None       :       None       :   
2   Czechia       :       None       :       None       :       None       :   
3   Denmark       :       None       :       None       :       None       :   
4   Germany       :       None       :       None       :       None       :   

  Unnamed: 8 2016-05  ... Unnamed: 196   2024-03 Unnamed: 198   2024-04  \
0       None       :  ...         None   160.492         None   168.435   
1       None       :  ...         None   519.203         None    360.31   
2       None       :  ...         None  2159.708         None  1550.964   
3       None       :  ...         None   276.518         None   148.599   
4       None       :  ...            e  8758.996            e   6004.85   

  Unnamed: 200   2024-05 Unnamed: 202 2024-06 Unnamed: 204 2024-07  

* **Step 4**: Drop Columns with Names Starting with “Unnamed”

You can use the Pandas filter function along with drop to remove columns whose names start with “Unnamed”:

In [19]:
# Drop columns where the column name starts with 'Unnamed'
euroStat_coal_cleaned = euroStat_coal_cleaned.loc[:, ~euroStat_coal_cleaned.columns.str.startswith('Unnamed')]

# Display the DataFrame after removing 'Unnamed' columns
print(euroStat_coal_cleaned.head())

       TIME 2016-01 2016-02 2016-03 2016-04 2016-05 2016-06 2016-07 2016-08  \
2   Belgium       :       :       :       :       :       :       :       :   
3  Bulgaria       :       :       :       :       :       :       :       :   
4   Czechia       :       :       :       :       :       :       :       :   
5   Denmark       :       :       :       :       :       :       :       :   
6   Germany       :       :       :       :       :       :       :       :   

  2016-09  ...   2023-10    2023-11    2023-12    2024-01   2024-02   2024-03  \
2       :  ...    70.669      64.24    100.765    152.345   175.176   160.492   
3       :  ...   781.035    871.225   1043.759    905.171   583.303   519.203   
4       :  ...  2610.974   2753.006   2762.875   2590.198  2245.389  2159.708   
5       :  ...    44.275    226.176    257.861    392.265    248.43   276.518   
6       :  ...  9674.588  10947.152  10597.592  10536.739  8501.007  8758.996   

    2024-04   2024-05 2024-06 2024-07 

* **Step 5**: Replace `:` with `np.nan`

To replace all occurrences of `:` with `np.nan`, you can use the replace function:

In [20]:
# Replace all occurrences of ':' with np.nan
euroStat_coal_cleaned.replace(':', np.nan, inplace=True)

# Display the DataFrame to confirm changes
print(euroStat_coal_cleaned.head())

       TIME 2016-01  2016-02  2016-03  2016-04  2016-05  2016-06  2016-07  \
2   Belgium     NaN      NaN      NaN      NaN      NaN      NaN      NaN   
3  Bulgaria     NaN      NaN      NaN      NaN      NaN      NaN      NaN   
4   Czechia     NaN      NaN      NaN      NaN      NaN      NaN      NaN   
5   Denmark     NaN      NaN      NaN      NaN      NaN      NaN      NaN   
6   Germany     NaN      NaN      NaN      NaN      NaN      NaN      NaN   

   2016-08  2016-09  ...   2023-10    2023-11    2023-12    2024-01   2024-02  \
2      NaN      NaN  ...    70.669     64.240    100.765    152.345   175.176   
3      NaN      NaN  ...   781.035    871.225   1043.759    905.171   583.303   
4      NaN      NaN  ...  2610.974   2753.006   2762.875   2590.198  2245.389   
5      NaN      NaN  ...    44.275    226.176    257.861    392.265   248.430   
6      NaN      NaN  ...  9674.588  10947.152  10597.592  10536.739  8501.007   

    2024-03   2024-04   2024-05  2024-06  2024-07 

  euroStat_coal_cleaned.replace(':', np.nan, inplace=True)


* **Step 6**: Replace the Updated Data in SQLite Database

After cleaning the data, you can update the SQLite database:

In [None]:
# Replace the table in the SQLite database with the updated data
euroStat_coal_cleaned.to_sql('euroStat_coal', conn, if_exists='replace', index=False)

# Commit the changes
conn.commit()

* **Step 4**: Query the Data to Confirm the Changes

Finally, you can query the first few rows to verify that the columns were dropped and the values were replaced.

In [22]:
# Query the first few rows from the cleaned euroStat_coal table
df_coal_cleaned = pd.read_sql_query("SELECT * FROM euroStat_coal LIMIT 5;", conn)

# Display the data
print(df_coal_cleaned)

       TIME 2016-01 Unnamed: 2 2016-02 Unnamed: 4 2016-03 Unnamed: 6 2016-04  \
0   Belgium       :       None       :       None       :       None       :   
1  Bulgaria       :       None       :       None       :       None       :   
2   Czechia       :       None       :       None       :       None       :   
3   Denmark       :       None       :       None       :       None       :   
4   Germany       :       None       :       None       :       None       :   

  Unnamed: 8 2016-05  ... Unnamed: 196   2024-03 Unnamed: 198   2024-04  \
0       None       :  ...         None   160.492         None   168.435   
1       None       :  ...         None   519.203         None    360.31   
2       None       :  ...         None  2159.708         None  1550.964   
3       None       :  ...         None   276.518         None   148.599   
4       None       :  ...            e  8758.996            e   6004.85   

  Unnamed: 200   2024-05 Unnamed: 202 2024-06 Unnamed: 204 2024-07  

## Workflow Summary

This workflow outlines the steps involved in loading energy production data from Eurostat into an SQLite database using Pandas and Python. The workflow then involves cleaning the data by removing specific rows, dropping unnecessary columns, and replacing invalid values (:) with NaN. Finally, the cleaned data is written back to the SQLite database and queried for verification.

1.	**Load Data from Excel**: Load energy production data from Eurostat .xlsx files into Pandas DataFrames.
2.	**Store Data in SQLite Database**: Write the DataFrame to an SQLite database.
3.	**Verify Table Creation**: Check if the data has been successfully added to the SQLite database.
4.	**Remove Rows**: Remove the first two rows (index 0 and 1) from the DataFrame.
5.	**Drop Unnecessary Columns**: Remove columns whose names start with “Unnamed”.
6.	**Replace Invalid Values**: Replace : with NaN in the DataFrame.
7.	**Update SQLite Database**: Replace the existing table in the SQLite database with the cleaned DataFrame.
8.	**Query and Verify**: Query the cleaned data from the SQLite database to verify that the cleaning steps were successful.

---

# SQL + Pandas

In this section we will cover a step-by-step example to integrate the energy production data stored in a database (SQLite) and then using Python to analyze and plot the data. This guide walks through each step, from loading CSV data into an SQLite database to querying and plotting the energy production data for different countries.

This sections includes downloading the data from individual `.csv` files, loading them into a SQLite database, and visualizing the energy production using Python.

This guide shows how to:
1. Download CSV files from Eurostat.
2. Load data into a SQLite database.
3. Use Python to query and visualize the data.

## Step 1: Download the Data from Eurostat

The datasets are available on the Eurostat website under [Energy Statistics](https://ec.europa.eu/eurostat/web/energy). Let’s assume the data files have been downloaded and stored in your `~/Downloads` folder as `.csv` files.

For this example, we are using the following datasets:

In [None]:
import pandas as pd
import sqlite3
import os

In [None]:
euroStat_coal_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_coal.xlsx'
euroStat_nonRenewables_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_combustionFuels_nonRenewables.xlsx'
euroStat_renewables_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_combustionFuels_Renewables.xlsx'
euroStat_geothermal_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_geothermal.xlsx'
euroStat_hydro_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_hydro.xlsx'
euroStat_naturalGas_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_naturalGas.xlsx'
euroStat_nuclear_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_nuclear.xlsx'
euroStat_oil_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_oil.xlsx'
euro_otherRenewables_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_otherRenewables.xlsx'
euroStat_solar_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_solar.xlsx'
euroStat_wind_filePath = '../data/section4/euroStat/nrg_cb_pem_page_spreadsheet_wind.xlsx'

## Step 2: Load Data into SQLite Database

We'll now read the downloaded CSV files, clean the data, and store them in an SQLite database.


### Python Code for Loading CSV Files into SQLite

In [None]:
# Load .xlsx file
euroStat_coal = pd.read_excel(euroStat_coal_filePath, sheet_name='coal', skiprows=range(0, 8))
euroStat_nonRenewables = pd.read_excel(euroStat_nonRenewables_filePath, sheet_name='nonRenewables', skiprows=range(0, 8))
euroStat_renewables = pd.read_excel(euroStat_renewables_filePath, sheet_name='renewables', skiprows=range(0, 8))
euroStat_geothermal = pd.read_excel(euroStat_geothermal_filePath, sheet_name='geothermal', skiprows=range(0, 8))
euroStat_hydro = pd.read_excel(euroStat_hydro_filePath, sheet_name='hydro', skiprows=range(0, 8))
euroStat_naturalGas = pd.read_excel(euroStat_naturalGas_filePath, sheet_name='naturalGas', skiprows=range(0, 8))
euroStat_nuclear = pd.read_excel(euroStat_nuclear_filePath, sheet_name='nuclear', skiprows=range(0, 8))
euroStat_oil = pd.read_excel(euroStat_oil_filePath, sheet_name='oil', skiprows=range(0, 8))
euro_otherRenewables = pd.read_excel(euro_otherRenewables_filePath, sheet_name='otherRenewables', skiprows=range(0, 8))
euroStat_solar = pd.read_excel(euroStat_solar_filePath, sheet_name='solar', skiprows=range(0, 8))
euroStat_wind = pd.read_excel(euroStat_wind_filePath, sheet_name='wind', skiprows=range(0, 8))


In [None]:
euroStat_coal.head()

Unnamed: 0,TIME,2016-01,Unnamed: 2,2016-02,Unnamed: 4,2016-03,Unnamed: 6,2016-04,Unnamed: 8,2016-05,...,Unnamed: 196,2024-03,Unnamed: 198,2024-04,Unnamed: 200,2024-05,Unnamed: 202,2024-06,Unnamed: 204,2024-07
0,GEO (Labels),,,,,,,,,,...,,,,,,,,,,
1,European Union - 27 countries (from 2020),:,,:,,:,,:,,:,...,,23110.523,,:,,:,,:,,:
2,Belgium,:,,:,,:,,:,,:,...,,160.492,,168.435,,179.131,,:,,:
3,Bulgaria,:,,:,,:,,:,,:,...,,519.203,,360.31,,290.517,,:,,:
4,Czechia,:,,:,,:,,:,,:,...,,2159.708,,1550.964,,1480.491,,:,,:


In [None]:
# Connect to SQLite database (it will be created if it doesn't exist)
conn = sqlite3.connect(db_path)
cursor = conn.cursor()

In [None]:
# Add the euroStat_coal dataset to the database using SQL
euroStat_coal.to_sql('euroStat_coal', conn, if_exists='replace', index=False)

50

In [3]:
# Define file paths and dataset names
datasets = ['coal', 'nonRenewables', 'renewables', 'geothermal', 'hydro', 'naturalGas', 'nuclear', 'oil', 'otherRenewables', 'solar', 'wind']
data_path = os.path.expanduser('~/Downloads/')  # Path where the CSV files are downloaded
db_path = 'energy_data.db'  # SQLite database file

In [None]:



for dataset in datasets:
    # Load the CSV file into a pandas DataFrame
    csv_file = os.path.join(data_path, f'{dataset}.csv')
    df = pd.read_csv(csv_file, skiprows=1)  # Assuming the first row is metadata

    # Rename columns properly (example: replace unnamed columns with actual month names)
    df.columns = ['Country'] + [f'Month_{i+1}' for i in range(len(df.columns)-1)]

    # Store each DataFrame in a table named after the dataset (e.g., 'wind', 'geothermal')
    df.to_sql(dataset, conn, if_exists='replace', index=False)

    print(f"Data from {dataset}.csv loaded into the {dataset} table.")

# Commit and close the database connection
conn.commit()
conn.close()

### Explanation:
1. **Loading CSV Files**: We read each CSV file and load it into a pandas DataFrame, assuming the first row is metadata.
2. **Cleaning Data**: We rename the columns, as some columns might be unnamed.
3. **Storing in SQLite**: Each dataset is saved as a table in the SQLite database. For example, the `wind.csv` file will be saved in a table named `wind`.

## Step 3: Query and Analyze the Data

Now that we’ve stored the data in an SQLite database, we can query it using SQL and visualize it in Python. Let’s start by analyzing and plotting the wind energy production data.

### Python Code for Querying and Plotting Wind Data

In [None]:
# Reconnect to the SQLite database
conn = sqlite3.connect('energy_data.db')

# Query wind energy data
wind_query = """
SELECT * FROM wind;
"""

# Load the data into a pandas DataFrame
wind_data = pd.read_sql_query(wind_query, conn)

# Close the database connection
conn.close()

# Preview the data
print(wind_data.head())

# Example plotting wind energy production for Belgium
country = 'Belgium'

# Extract the data for Belgium (replace : and NaN values with 0 for simplicity)
wind_belgium = wind_data[wind_data['Country'] == country].fillna(0)

# Transpose to have months on x-axis
months = wind_belgium.columns[1:]  # Skip the 'Country' column
energy_values = wind_belgium.iloc[0, 1:]  # Skip the 'Country' column

# Plotting the data
plt.figure(figsize=(10, 6))
plt.plot(months, energy_values, marker='o')
plt.title(f'Wind Energy Production in {country} (MWh)')
plt.xlabel('Months')
plt.ylabel('Energy Generated (MWh)')
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()

### Explanation:
1. **Query Data**: We query the `wind` table and load the data into a pandas DataFrame.
2. **Data Cleaning**: We fill missing values (`NaN` or `:` in the dataset) with `0` for simplicity.
3. **Plotting**: We plot the wind energy production for Belgium over the months using `matplotlib`.

## Step 4: Plotting Energy Production for Multiple Countries

You can extend this example to plot data for multiple countries or compare the production of different renewable energy sources.

### Example Code to Plot Energy Production for Multiple Countries

In [None]:
# Example for comparing wind energy production of Belgium, Germany, and Spain
countries = ['Belgium', 'Germany', 'Spain']
plt.figure(figsize=(10, 6))

for country in countries:
    country_data = wind_data[wind_data['Country'] == country].fillna(0)
    energy_values = country_data.iloc[0, 1:]  # Skip the 'Country' column
    
    plt.plot(months, energy_values, marker='o', label=country)

# Add labels, title, and legend
plt.title('Wind Energy Production in Belgium, Germany, and Spain (MWh)')
plt.xlabel('Months')
plt.ylabel('Energy Generated (MWh)')
plt.xticks(rotation=45)
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

### Explanation:
1. **Multiple Countries**: We loop through a list of countries (e.g., Belgium, Germany, Spain) and plot their wind energy production on the same graph for comparison.

## Workflow Summary

1. **Downloading and Loading Data**: We downloaded the CSV files from Eurostat and loaded them into an SQLite database.
2. **Querying the Database**: We queried specific tables from the database using SQL queries in Python.
3. **Visualizing the Data**: We plotted the renewable energy production data (wind in this case) using matplotlib to visualize the monthly energy production for different countries.

This approach allows you to easily handle large datasets, store them in a database, and perform complex analysis and visualization using Python.


## Summary

In this section, we provided a detailed step-by-step guide to integrating renewable energy production data stored in an SQLite database with Python for data analysis and visualization. The process includes downloading CSV files from the Eurostat website, loading them into an SQLite database using Python, and then querying the data for analysis and visualization.

## Key Points Covered:

1.	Data Download and Preparation: We downloaded the relevant CSV files for different renewable energy sources (wind, solar, geothermal, hydro, etc.) from Eurostat and stored them locally.
2.	Loading Data into SQLite: The downloaded CSV files were cleaned and stored in an SQLite database using the pandas and sqlite3 libraries in Python. Each dataset was stored in its own table within the database.
3.	Querying the Data: We queried the SQLite database for specific data (e.g., wind energy production) using SQL commands within Python.
4.	Visualizing Data: Using matplotlib, we created plots of the energy production data (e.g., wind energy production in Belgium) across different months.
5.	Comparison Across Countries: We extended the example to compare the energy production of multiple countries on a single plot, allowing for better insights into trends and differences in renewable energy generation.

## Lessons Learned

1.	**Handling Large Datasets**: By using `SQLite`, we can efficiently store and manage large datasets locally, allowing for easy querying and retrieval of data as needed.
2.	**Data Cleaning and Preparation**: Properly cleaning and formatting the data, such as renaming columns and filling missing values, is essential for accurate analysis.
3.	**SQL Integration with Python**: Using `SQL` within Python allows for powerful querying capabilities, which can be further extended by combining results with pandas for in-depth analysis.
4.	**Visualization**: Python’s `matplotlib` library makes it easy to create visual representations of the data, helping to uncover trends and insights from complex datasets.
5.	**Scalability**: The approach can be easily extended to more complex datasets, multiple energy sources, or multiple countries, providing a scalable solution for energy data analysis.