# **Problem: Predict Solar power generation.**

1. Combine two CSV files, where each row belong to a unique timestamp, which is common across the two tables.
2. Predict the solar power generation at any point using the combined CSV file.

**Examples:**

Change the variable 'url' by providing the google drive URL of the zip file, that you want to download.

Eg:url = 'https://drive.google.com/file/d/1dVBMQb-eKRq92WMKfJDbTBt-j-W_5s5u/view?usp=sharing'

Run all the cells. After executing the last cell, you will see the predicted solar power mapped with the actual solar power generated.

**Notes:**

Following things are needed to be checked before running the program.
1. matplotlib module is needed to be installed in the local machine to run this program. 
2. sklearn module is needed to be installed in the local machine to run this program. 
3. gdown module is needed to be installed in the local machine.
4. zipfile module is needed to be installed in the local machine.
5. Check whether you have given the correct location of your dataset file.
6. You should have access to the file in the Google Drive.



# **Import Modules**

In [1]:
# Import pandas
import pandas as pd

# Import pyplot module to plot the results
import matplotlib.pyplot as plt

# Import train_test_split module to split the data into train and test
from sklearn.model_selection import train_test_split

# Import LinearRegression module to use in model training
from sklearn.linear_model import LinearRegression

# Import gdown module to download files from google drive
import gdown

# Import zip file module to open the zip file
from zipfile import ZipFile

# **Get the file location from google drive and download**

In [6]:
,# Please change the URL as needed (make sure you have the access to the file)

url = 'https://drive.google.com/file/d/1dVBMQb-eKRq92WMKfJDbTBt-j-W_5s5u/view?usp=sharing'

# Derive the file id from the URL
file_id = url.split('/')[-2]

# Derive the download url of the the file
download_url = 'https://drive.google.com/uc?id=' + file_id

# Give the location you want to save it in your local machine
file_location = 'solar.zip'

# Download the file from drive to your local machine
gdown.download(download_url, file_location, quiet=False)

Downloading...
From: https://drive.google.com/uc?id=1dVBMQb-eKRq92WMKfJDbTBt-j-W_5s5u
To: /content/solar.zip
100%|██████████| 1.01M/1.01M [00:00<00:00, 105MB/s]


'solar.zip'

# **Unzip the zip dataset**

In [7]:
!unzip /content/solar.zip -d "/content/unzipped_folder/"

Archive:  /content/solar.zip
   creating: /content/unzipped_folder/solar/
  inflating: /content/unzipped_folder/solar/Plant_2_Generation_Data.csv  
  inflating: /content/unzipped_folder/solar/Plant_2_Weather_Sensor_Data.csv  


# **Read and combine the CSVs**

In [9]:
# Read 1st csv file
plant = pd.read_csv('unzipped_folder/solar/Plant_2_Generation_Data.csv', sep = ',', engine = 'python', header = 0)

# Read 2nd csv file
weather = pd.read_csv('unzipped_folder/solar/Plant_2_Weather_Sensor_Data.csv', sep = ',', engine = 'python', header = 0)

# Combine the two csv files using DATE_TIME coloumn
combined_file = plant.merge(weather, on=["DATE_TIME", "PLANT_ID"], suffixes=("_GENERATION", "_WEATHER"))

# Save the combined as a csv
combined_file.to_csv('output.csv', sep = ',')

# **Start the training and prediction**

In [10]:
# Get feature coloumns
X2 = combined_file[['AMBIENT_TEMPERATURE', 'MODULE_TEMPERATURE', 'IRRADIATION']]

# Get target coloumn
y2 = combined_file['AC_POWER']

# Split the data into train and test
X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2, test_size=0.3)

# Initialize LinearRegression class
lm2 = LinearRegression()

# Fit the training data
lm2.fit(X2_train, y2_train)

# Get the predictions
predictions = lm2.predict(X2_test)

# **Plot the results**

In [None]:
plt.scatter(y2_test, predictions)
plt.title('Actual Solar Output Values vs Predicted Values for Plant 2')
plt.xlabel('Predicted Output')
plt.ylabel('Actual Output')

plt.show()