<a href="https://colab.research.google.com/github/priyadharsh73/solar_plant_analysis/blob/main/solar_eda.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# EDA of solar plant data


In [None]:
#importing Python libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from IPython.display import HTML as html_print
from termcolor import colored
from IPython.display import display

import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="scipy")

In [None]:
#pandas options for dataframes allows unlimited columns, rows and formatting floating-point nos. to 3 decimal places
pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

In [None]:
#loading the generation data for plant 1
df_1_Generation = pd.read_csv("https://raw.githubusercontent.com/priyadharsh73/solar_plant_analysis/main/Plant_1_Generation_Data.csv")

In [12]:
df_1_Generation.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 68778 entries, 0 to 68777
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   DATE_TIME    68778 non-null  object 
 1   PLANT_ID     68778 non-null  int64  
 2   SOURCE_KEY   68778 non-null  object 
 3   DC_POWER     68778 non-null  float64
 4   AC_POWER     68778 non-null  float64
 5   DAILY_YIELD  68778 non-null  float64
 6   TOTAL_YIELD  68778 non-null  float64
dtypes: float64(4), int64(1), object(2)
memory usage: 3.7+ MB


The solar power generation data represents energy measurements from various sources within a power plant. Let’s break down the columns:

* `DATE_TIME`: The timestamp when the measurements were recorded
(in the format “DD-MM-YYYY HH:MM”).
* `PLANT_ID`: An identifier for the specific power plant.
* `SOURCE_KEY`: A unique identifier for each solar panel.
* `DC_POWER`: The direct current (DC) power generated by the source (measured in watts).
* `AC_POWER`: The alternating current (AC) power converted from DC (measured in watts).
* `DAILY_YIELD`: The total energy yield from the source on that day (accumulated value, possibly in kilowatt-hours).
* `TOTAL_YIELD`: The cumulative total energy yield from the source since its installation (possibly in kilowatt-hours).

In [None]:
df_1_Generation.head()

Unnamed: 0,DATE_TIME,PLANT_ID,SOURCE_KEY,DC_POWER,AC_POWER,DAILY_YIELD,TOTAL_YIELD
0,15-05-2020 00:00,4135001,1BY6WEcLGh8j5v7,0.0,0.0,0.0,6259559.0
1,15-05-2020 00:00,4135001,1IF53ai7Xc0U56Y,0.0,0.0,0.0,6183645.0
2,15-05-2020 00:00,4135001,3PZuoBAID5Wc2HD,0.0,0.0,0.0,6987759.0
3,15-05-2020 00:00,4135001,7JYdWkrLSPkdwr4,0.0,0.0,0.0,7602960.0
4,15-05-2020 00:00,4135001,McdE0feGgRqW7Ca,0.0,0.0,0.0,7158964.0


In [33]:
#loading the weather sensor data for plant 1
df_1_Weather = pd.read_csv("https://raw.githubusercontent.com/priyadharsh73/solar_plant_analysis/main/Plant_1_Weather_Sensor_Data.csv")

In [34]:
df_1_Weather.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3182 entries, 0 to 3181
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   DATE_TIME            3182 non-null   object 
 1   PLANT_ID             3182 non-null   int64  
 2   SOURCE_KEY           3182 non-null   object 
 3   AMBIENT_TEMPERATURE  3182 non-null   float64
 4   MODULE_TEMPERATURE   3182 non-null   float64
 5   IRRADIATION          3182 non-null   float64
dtypes: float64(3), int64(1), object(2)
memory usage: 149.3+ KB


Weather data for solar power plant with respect to time temperatures and
irradiance. Let’s break down the columns:

* `DATE_TIME:` The timestamp when the data was recorded.
* `PLANT_ID:` An identifier for the specific solar power plant.
* `SOURCE_KEY:` Another identifier, likely representing a specific solar panel or module within the plant.
* `AMBIENT_TEMPERATURE:` The ambient temperature at the time of measurement.
* `MODULE_TEMPERATURE:` The temperature of the solar module itself.
* `IRRADIATION:` The amount of solar irradiation (sunlight) received by the
module.
These parameters are crucial for understanding the performance and efficiency of the solar panels.

In [25]:
df_1_Weather.head()

Unnamed: 0,DATE_TIME,PLANT_ID,SOURCE_KEY,AMBIENT_TEMPERATURE,MODULE_TEMPERATURE,IRRADIATION
0,2020-05-15 00:00:00,4135001,HmiyD2TTLFNqkNe,25.184,22.858,0.0
1,2020-05-15 00:15:00,4135001,HmiyD2TTLFNqkNe,25.085,22.762,0.0
2,2020-05-15 00:30:00,4135001,HmiyD2TTLFNqkNe,24.936,22.592,0.0
3,2020-05-15 00:45:00,4135001,HmiyD2TTLFNqkNe,24.846,22.361,0.0
4,2020-05-15 01:00:00,4135001,HmiyD2TTLFNqkNe,24.622,22.165,0.0


In [37]:
#loading the generation data for plant 2
df_2_Generation = pd.read_csv("https://raw.githubusercontent.com/priyadharsh73/solar_plant_analysis/main/Plant_2_Generation_Data.csv")

In [43]:
#loading the weather sensor data for plant 2
df_2_Weather = pd.read_csv("https://raw.githubusercontent.com/priyadharsh73/solar_plant_analysis/main/Plant_2_Weather_Sensor_Data.csv")

In [38]:
## exploratory data analysis
# fixing title colors, style for the dataframe
def print_section_title(title):
    print(colored(title, 'green', attrs=['bold', 'italic']))

def display_head_and_tail(dataframe, head=5):
    display(dataframe.head(head).style.set_caption("Head"))
    display(dataframe.tail(head).style.set_caption("Tail"))

In [39]:
def display_na(dataframe):
    na_df = dataframe.isnull().sum().reset_index()
    na_df.columns = ['Column', 'Number of NA']
    display(na_df.style.set_caption("Number of NA Values"))

In [40]:
#fixing the quartiles for the dataframe
def display_quantiles(dataframe):
    quantiles_df = dataframe.describe([0, 0.10, 0.25, 0.50, 0.75, 0.90, 1]).T
    display(quantiles_df.style.format("{:.2f}").set_caption("Quantiles"))

In [41]:
def check_df(dataframe, head=6):
    print_section_title('Shape')
    print(dataframe.shape)
    print_section_title('Types')
    print(dataframe.dtypes.to_frame('Data Type').style.set_caption("Data Types"))
    print_section_title('Info')
    print(dataframe.info())
    print_section_title('Head & Tail')
    display_head_and_tail(dataframe, head)
    print_section_title('NA Values')
    display_na(dataframe)
    print_section_title('Quantiles')
    display_quantiles(dataframe)

In [17]:
check_df(df_1_Generation)

Shape
(68778, 7)
Types
<pandas.io.formats.style.Styler object at 0x7e420207b250>
Info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 68778 entries, 0 to 68777
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   DATE_TIME    68778 non-null  object 
 1   PLANT_ID     68778 non-null  int64  
 2   SOURCE_KEY   68778 non-null  object 
 3   DC_POWER     68778 non-null  float64
 4   AC_POWER     68778 non-null  float64
 5   DAILY_YIELD  68778 non-null  float64
 6   TOTAL_YIELD  68778 non-null  float64
dtypes: float64(4), int64(1), object(2)
memory usage: 3.7+ MB
None
Head & Tail


Unnamed: 0,DATE_TIME,PLANT_ID,SOURCE_KEY,DC_POWER,AC_POWER,DAILY_YIELD,TOTAL_YIELD
0,15-05-2020 00:00,4135001,1BY6WEcLGh8j5v7,0.0,0.0,0.0,6259559.0
1,15-05-2020 00:00,4135001,1IF53ai7Xc0U56Y,0.0,0.0,0.0,6183645.0
2,15-05-2020 00:00,4135001,3PZuoBAID5Wc2HD,0.0,0.0,0.0,6987759.0
3,15-05-2020 00:00,4135001,7JYdWkrLSPkdwr4,0.0,0.0,0.0,7602960.0
4,15-05-2020 00:00,4135001,McdE0feGgRqW7Ca,0.0,0.0,0.0,7158964.0
5,15-05-2020 00:00,4135001,VHMLBKoKgIrUVDU,0.0,0.0,0.0,7206408.0


Unnamed: 0,DATE_TIME,PLANT_ID,SOURCE_KEY,DC_POWER,AC_POWER,DAILY_YIELD,TOTAL_YIELD
68772,17-06-2020 23:45,4135001,sjndEbLyjtCKgGv,0.0,0.0,5887.0,7261681.0
68773,17-06-2020 23:45,4135001,uHbuxQJl8lW7ozc,0.0,0.0,5967.0,7287002.0
68774,17-06-2020 23:45,4135001,wCURE6d3bPkepu2,0.0,0.0,5147.625,7028601.0
68775,17-06-2020 23:45,4135001,z9Y9gH1T5YWrNuG,0.0,0.0,5819.0,7251204.0
68776,17-06-2020 23:45,4135001,zBIq5rxdHJRwDNY,0.0,0.0,5817.0,6583369.0
68777,17-06-2020 23:45,4135001,zVJPv84UY57bAof,0.0,0.0,5910.0,7363272.0


NA Values


Unnamed: 0,Column,Number of NA
0,DATE_TIME,0
1,PLANT_ID,0
2,SOURCE_KEY,0
3,DC_POWER,0
4,AC_POWER,0
5,DAILY_YIELD,0
6,TOTAL_YIELD,0


Quantiles


Unnamed: 0,count,mean,std,min,0%,10%,25%,50%,75%,90%,100%,max
PLANT_ID,68778.0,4135001.0,0.0,4135001.0,4135001.0,4135001.0,4135001.0,4135001.0,4135001.0,4135001.0,4135001.0,4135001.0
DC_POWER,68778.0,3147.43,4036.46,0.0,0.0,0.0,0.0,429.0,6366.96,9762.8,14471.12,14471.12
AC_POWER,68778.0,307.8,394.4,0.0,0.0,0.0,0.0,41.49,623.62,954.07,1410.95,1410.95
DAILY_YIELD,68778.0,3295.97,3145.18,0.0,0.0,0.0,0.0,2658.71,6274.0,7768.61,9163.0,9163.0
TOTAL_YIELD,68778.0,6978711.76,416271.98,6183645.0,6183645.0,6350673.17,6512002.54,7146685.0,7268705.91,7373268.0,7846821.0,7846821.0


In [21]:
check_df(df_1_Weather)

Shape
(3182, 6)
Types
<pandas.io.formats.style.Styler object at 0x7e4201e8fd90>
Info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3182 entries, 0 to 3181
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   DATE_TIME            3182 non-null   object 
 1   PLANT_ID             3182 non-null   int64  
 2   SOURCE_KEY           3182 non-null   object 
 3   AMBIENT_TEMPERATURE  3182 non-null   float64
 4   MODULE_TEMPERATURE   3182 non-null   float64
 5   IRRADIATION          3182 non-null   float64
dtypes: float64(3), int64(1), object(2)
memory usage: 149.3+ KB
None
Head & Tail


Unnamed: 0,DATE_TIME,PLANT_ID,SOURCE_KEY,AMBIENT_TEMPERATURE,MODULE_TEMPERATURE,IRRADIATION
0,2020-05-15 00:00:00,4135001,HmiyD2TTLFNqkNe,25.184316,22.857507,0.0
1,2020-05-15 00:15:00,4135001,HmiyD2TTLFNqkNe,25.084589,22.761668,0.0
2,2020-05-15 00:30:00,4135001,HmiyD2TTLFNqkNe,24.935753,22.592306,0.0
3,2020-05-15 00:45:00,4135001,HmiyD2TTLFNqkNe,24.84613,22.360852,0.0
4,2020-05-15 01:00:00,4135001,HmiyD2TTLFNqkNe,24.621525,22.165423,0.0
5,2020-05-15 01:15:00,4135001,HmiyD2TTLFNqkNe,24.536092,21.968571,0.0


Unnamed: 0,DATE_TIME,PLANT_ID,SOURCE_KEY,AMBIENT_TEMPERATURE,MODULE_TEMPERATURE,IRRADIATION
3176,2020-06-17 22:30:00,4135001,HmiyD2TTLFNqkNe,22.171737,21.080829,0.0
3177,2020-06-17 22:45:00,4135001,HmiyD2TTLFNqkNe,22.15057,21.480377,0.0
3178,2020-06-17 23:00:00,4135001,HmiyD2TTLFNqkNe,22.129816,21.389024,0.0
3179,2020-06-17 23:15:00,4135001,HmiyD2TTLFNqkNe,22.008275,20.709211,0.0
3180,2020-06-17 23:30:00,4135001,HmiyD2TTLFNqkNe,21.969495,20.734963,0.0
3181,2020-06-17 23:45:00,4135001,HmiyD2TTLFNqkNe,21.909288,20.427972,0.0


NA Values


Unnamed: 0,Column,Number of NA
0,DATE_TIME,0
1,PLANT_ID,0
2,SOURCE_KEY,0
3,AMBIENT_TEMPERATURE,0
4,MODULE_TEMPERATURE,0
5,IRRADIATION,0


Quantiles


Unnamed: 0,count,mean,std,min,0%,10%,25%,50%,75%,90%,100%,max
PLANT_ID,3182.0,4135001.0,0.0,4135001.0,4135001.0,4135001.0,4135001.0,4135001.0,4135001.0,4135001.0,4135001.0,4135001.0
AMBIENT_TEMPERATURE,3182.0,25.53,3.35,20.4,20.4,21.88,22.71,24.61,27.92,30.52,35.25,35.25
MODULE_TEMPERATURE,3182.0,31.09,12.26,18.14,18.14,20.19,21.09,24.62,41.31,50.61,65.55,65.55
IRRADIATION,3182.0,0.23,0.3,0.0,0.0,0.0,0.0,0.02,0.45,0.72,1.22,1.22


In [None]:
check_df(df_2_Generation)

Shape
(67698, 7)
Types
<pandas.io.formats.style.Styler object at 0x7e1f35ffeef0>
Info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 67698 entries, 0 to 67697
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   DATE_TIME    67698 non-null  object 
 1   PLANT_ID     67698 non-null  int64  
 2   SOURCE_KEY   67698 non-null  object 
 3   DC_POWER     67698 non-null  float64
 4   AC_POWER     67698 non-null  float64
 5   DAILY_YIELD  67698 non-null  float64
 6   TOTAL_YIELD  67698 non-null  float64
dtypes: float64(4), int64(1), object(2)
memory usage: 3.6+ MB
None
Head & Tail


Unnamed: 0,DATE_TIME,PLANT_ID,SOURCE_KEY,DC_POWER,AC_POWER,DAILY_YIELD,TOTAL_YIELD
0,2020-05-15 00:00:00,4136001,4UPUqMRk7TRMgml,0.0,0.0,9425.0,2429011.0
1,2020-05-15 00:00:00,4136001,81aHJ1q11NBPMrL,0.0,0.0,0.0,1215278736.0
2,2020-05-15 00:00:00,4136001,9kRcWv60rDACzjR,0.0,0.0,3075.333333,2247719577.0
3,2020-05-15 00:00:00,4136001,Et9kgGMDl729KT4,0.0,0.0,269.933333,1704250.0
4,2020-05-15 00:00:00,4136001,IQ2d7wF4YD8zU1Q,0.0,0.0,3177.0,19941526.0
5,2020-05-15 00:00:00,4136001,LYwnQax7tkwH5Cb,0.0,0.0,1872.5,1794958634.0


Unnamed: 0,DATE_TIME,PLANT_ID,SOURCE_KEY,DC_POWER,AC_POWER,DAILY_YIELD,TOTAL_YIELD
67692,2020-06-17 23:45:00,4136001,oZZkBaNadn6DNKz,0.0,0.0,4389.0,1708287724.0
67693,2020-06-17 23:45:00,4136001,q49J1IKaHRwDQnt,0.0,0.0,4157.0,520758.0
67694,2020-06-17 23:45:00,4136001,rrq4fwE8jgrTyWY,0.0,0.0,3931.0,121131356.0
67695,2020-06-17 23:45:00,4136001,vOuJvMaM2sgwLmb,0.0,0.0,4322.0,2427691.0
67696,2020-06-17 23:45:00,4136001,xMbIugepa2P7lBB,0.0,0.0,4218.0,106896394.0
67697,2020-06-17 23:45:00,4136001,xoJJ8DcxJEcupym,0.0,0.0,4316.0,209335741.0


NA Values


Unnamed: 0,Column,Number of NA
0,DATE_TIME,0
1,PLANT_ID,0
2,SOURCE_KEY,0
3,DC_POWER,0
4,AC_POWER,0
5,DAILY_YIELD,0
6,TOTAL_YIELD,0


Quantiles


Unnamed: 0,count,mean,std,min,0%,10%,25%,50%,75%,90%,100%,max
PLANT_ID,67698.0,4136001.0,0.0,4136001.0,4136001.0,4136001.0,4136001.0,4136001.0,4136001.0,4136001.0,4136001.0,4136001.0
DC_POWER,67698.0,246.7,370.57,0.0,0.0,0.0,0.0,0.0,446.59,887.31,1420.93,1420.93
AC_POWER,67698.0,241.28,362.11,0.0,0.0,0.0,0.0,0.0,438.22,867.88,1385.42,1385.42
DAILY_YIELD,67698.0,3294.89,2919.45,0.0,0.0,0.0,272.75,2911.0,5534.0,7760.35,9873.0,9873.0
TOTAL_YIELD,67698.0,658944788.42,729667771.07,0.0,0.0,1842629.1,19964944.87,282627587.0,1348495113.0,1708271611.0,2247916295.0,2247916295.0


In [44]:
check_df(df_2_Weather)

Shape
(3259, 6)
Types
<pandas.io.formats.style.Styler object at 0x7e4200ede110>
Info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3259 entries, 0 to 3258
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   DATE_TIME            3259 non-null   object 
 1   PLANT_ID             3259 non-null   int64  
 2   SOURCE_KEY           3259 non-null   object 
 3   AMBIENT_TEMPERATURE  3259 non-null   float64
 4   MODULE_TEMPERATURE   3259 non-null   float64
 5   IRRADIATION          3259 non-null   float64
dtypes: float64(3), int64(1), object(2)
memory usage: 152.9+ KB
None
Head & Tail


Unnamed: 0,DATE_TIME,PLANT_ID,SOURCE_KEY,AMBIENT_TEMPERATURE,MODULE_TEMPERATURE,IRRADIATION
0,2020-05-15 00:00:00,4136001,iq8k7ZNt4Mwm3w0,27.004764,25.060789,0.0
1,2020-05-15 00:15:00,4136001,iq8k7ZNt4Mwm3w0,26.880811,24.421869,0.0
2,2020-05-15 00:30:00,4136001,iq8k7ZNt4Mwm3w0,26.682055,24.42729,0.0
3,2020-05-15 00:45:00,4136001,iq8k7ZNt4Mwm3w0,26.500589,24.420678,0.0
4,2020-05-15 01:00:00,4136001,iq8k7ZNt4Mwm3w0,26.596148,25.08821,0.0
5,2020-05-15 01:15:00,4136001,iq8k7ZNt4Mwm3w0,26.51274,25.31797,0.0


Unnamed: 0,DATE_TIME,PLANT_ID,SOURCE_KEY,AMBIENT_TEMPERATURE,MODULE_TEMPERATURE,IRRADIATION
3253,2020-06-17 22:30:00,4136001,iq8k7ZNt4Mwm3w0,23.628108,23.02921,0.0
3254,2020-06-17 22:45:00,4136001,iq8k7ZNt4Mwm3w0,23.511703,22.856201,0.0
3255,2020-06-17 23:00:00,4136001,iq8k7ZNt4Mwm3w0,23.482282,22.74419,0.0
3256,2020-06-17 23:15:00,4136001,iq8k7ZNt4Mwm3w0,23.354743,22.492245,0.0
3257,2020-06-17 23:30:00,4136001,iq8k7ZNt4Mwm3w0,23.291048,22.373909,0.0
3258,2020-06-17 23:45:00,4136001,iq8k7ZNt4Mwm3w0,23.202871,22.535908,0.0


NA Values


Unnamed: 0,Column,Number of NA
0,DATE_TIME,0
1,PLANT_ID,0
2,SOURCE_KEY,0
3,AMBIENT_TEMPERATURE,0
4,MODULE_TEMPERATURE,0
5,IRRADIATION,0


Quantiles


Unnamed: 0,count,mean,std,min,0%,10%,25%,50%,75%,90%,100%,max
PLANT_ID,3259.0,4136001.0,0.0,4136001.0,4136001.0,4136001.0,4136001.0,4136001.0,4136001.0,4136001.0,4136001.0,4136001.0
AMBIENT_TEMPERATURE,3259.0,28.07,4.06,20.94,20.94,23.66,24.6,26.98,31.06,34.27,39.18,39.18
MODULE_TEMPERATURE,3259.0,32.77,11.34,20.27,20.27,22.46,23.72,27.53,40.48,51.61,66.64,66.64
IRRADIATION,3259.0,0.23,0.31,0.0,0.0,0.0,0.0,0.02,0.44,0.79,1.1,1.1


In [18]:
print(df_1_Generation.columns) # Print the columns of the DataFrame
df_1_Generation.drop('PLANT_ID', axis=1, inplace=True)
print(df_1_Generation.columns)
print(df_1_Generation.shape)

Index(['DATE_TIME', 'PLANT_ID', 'SOURCE_KEY', 'DC_POWER', 'AC_POWER', 'DAILY_YIELD',
       'TOTAL_YIELD'],
      dtype='object')
Index(['DATE_TIME', 'SOURCE_KEY', 'DC_POWER', 'AC_POWER', 'DAILY_YIELD', 'TOTAL_YIELD'], dtype='object')
(68778, 6)


In Python’s pandas library, the `drop()` method provides the capability to eliminate specific rows or columns from a DataFrame. This operation is particularly useful when you need to manipulate the structure of your data by excluding certain elements.

In [20]:
print(df_1_Generation)

              DATE_TIME       SOURCE_KEY  DC_POWER  AC_POWER  DAILY_YIELD  TOTAL_YIELD
0      15-05-2020 00:00  1BY6WEcLGh8j5v7     0.000     0.000        0.000  6259559.000
1      15-05-2020 00:00  1IF53ai7Xc0U56Y     0.000     0.000        0.000  6183645.000
2      15-05-2020 00:00  3PZuoBAID5Wc2HD     0.000     0.000        0.000  6987759.000
3      15-05-2020 00:00  7JYdWkrLSPkdwr4     0.000     0.000        0.000  7602960.000
4      15-05-2020 00:00  McdE0feGgRqW7Ca     0.000     0.000        0.000  7158964.000
...                 ...              ...       ...       ...          ...          ...
68773  17-06-2020 23:45  uHbuxQJl8lW7ozc     0.000     0.000     5967.000  7287002.000
68774  17-06-2020 23:45  wCURE6d3bPkepu2     0.000     0.000     5147.625  7028601.000
68775  17-06-2020 23:45  z9Y9gH1T5YWrNuG     0.000     0.000     5819.000  7251204.000
68776  17-06-2020 23:45  zBIq5rxdHJRwDNY     0.000     0.000     5817.000  6583369.000
68777  17-06-2020 23:45  zVJPv84UY57bAof   

In [35]:
print(df_1_Weather.columns) # Print the columns of the DataFrame
df_1_Weather.drop('PLANT_ID', axis=1, inplace=True)
print(df_1_Weather.columns)
print(df_1_Weather.shape)

Index(['DATE_TIME', 'PLANT_ID', 'SOURCE_KEY', 'AMBIENT_TEMPERATURE', 'MODULE_TEMPERATURE',
       'IRRADIATION'],
      dtype='object')
Index(['DATE_TIME', 'SOURCE_KEY', 'AMBIENT_TEMPERATURE', 'MODULE_TEMPERATURE', 'IRRADIATION'], dtype='object')
(3182, 5)


In [36]:
print(df_1_Weather)

                DATE_TIME       SOURCE_KEY  AMBIENT_TEMPERATURE  MODULE_TEMPERATURE  IRRADIATION
0     2020-05-15 00:00:00  HmiyD2TTLFNqkNe               25.184              22.858        0.000
1     2020-05-15 00:15:00  HmiyD2TTLFNqkNe               25.085              22.762        0.000
2     2020-05-15 00:30:00  HmiyD2TTLFNqkNe               24.936              22.592        0.000
3     2020-05-15 00:45:00  HmiyD2TTLFNqkNe               24.846              22.361        0.000
4     2020-05-15 01:00:00  HmiyD2TTLFNqkNe               24.622              22.165        0.000
...                   ...              ...                  ...                 ...          ...
3177  2020-06-17 22:45:00  HmiyD2TTLFNqkNe               22.151              21.480        0.000
3178  2020-06-17 23:00:00  HmiyD2TTLFNqkNe               22.130              21.389        0.000
3179  2020-06-17 23:15:00  HmiyD2TTLFNqkNe               22.008              20.709        0.000
3180  2020-06-17 23:30:00  Hmi

In [42]:
print(df_2_Generation.columns) # Print the columns of the DataFrame
df_2_Generation.drop('PLANT_ID', axis=1, inplace=True)
print(df_2_Generation.columns)
print(df_2_Generation.shape)
print(df_2_Generation)

Index(['DATE_TIME', 'PLANT_ID', 'SOURCE_KEY', 'DC_POWER', 'AC_POWER', 'DAILY_YIELD',
       'TOTAL_YIELD'],
      dtype='object')
Index(['DATE_TIME', 'SOURCE_KEY', 'DC_POWER', 'AC_POWER', 'DAILY_YIELD', 'TOTAL_YIELD'], dtype='object')
(67698, 6)
                 DATE_TIME       SOURCE_KEY  DC_POWER  AC_POWER  DAILY_YIELD    TOTAL_YIELD
0      2020-05-15 00:00:00  4UPUqMRk7TRMgml     0.000     0.000     9425.000    2429011.000
1      2020-05-15 00:00:00  81aHJ1q11NBPMrL     0.000     0.000        0.000 1215278736.000
2      2020-05-15 00:00:00  9kRcWv60rDACzjR     0.000     0.000     3075.333 2247719577.000
3      2020-05-15 00:00:00  Et9kgGMDl729KT4     0.000     0.000      269.933    1704250.000
4      2020-05-15 00:00:00  IQ2d7wF4YD8zU1Q     0.000     0.000     3177.000   19941526.000
...                    ...              ...       ...       ...          ...            ...
67693  2020-06-17 23:45:00  q49J1IKaHRwDQnt     0.000     0.000     4157.000     520758.000
67694  2020-06-17 

In [45]:
print(df_2_Weather.columns) # Print the columns of the DataFrame
df_2_Weather.drop('PLANT_ID', axis=1, inplace=True)
print(df_2_Weather.columns)
print(df_2_Weather.shape)
print(df_2_Weather)

Index(['DATE_TIME', 'PLANT_ID', 'SOURCE_KEY', 'AMBIENT_TEMPERATURE', 'MODULE_TEMPERATURE',
       'IRRADIATION'],
      dtype='object')
Index(['DATE_TIME', 'SOURCE_KEY', 'AMBIENT_TEMPERATURE', 'MODULE_TEMPERATURE', 'IRRADIATION'], dtype='object')
(3259, 5)
                DATE_TIME       SOURCE_KEY  AMBIENT_TEMPERATURE  MODULE_TEMPERATURE  IRRADIATION
0     2020-05-15 00:00:00  iq8k7ZNt4Mwm3w0               27.005              25.061        0.000
1     2020-05-15 00:15:00  iq8k7ZNt4Mwm3w0               26.881              24.422        0.000
2     2020-05-15 00:30:00  iq8k7ZNt4Mwm3w0               26.682              24.427        0.000
3     2020-05-15 00:45:00  iq8k7ZNt4Mwm3w0               26.501              24.421        0.000
4     2020-05-15 01:00:00  iq8k7ZNt4Mwm3w0               26.596              25.088        0.000
...                   ...              ...                  ...                 ...          ...
3254  2020-06-17 22:45:00  iq8k7ZNt4Mwm3w0               23.512 

In [50]:
df_1_Generation['DATE_TIME'] = pd.to_datetime(df_1_Generation['DATE_TIME'], format='%Y-%m-%d %H:%M')

df_1_Weather['DATE_TIME'] = pd.to_datetime(df_1_Weather['DATE_TIME'], format='%Y-%m-%d %H:%M:%S')

df_2_Generation['DATE_TIME'] = pd.to_datetime(df_2_Generation['DATE_TIME'], format='%Y-%m-%d %H:%M:%S')

df_2_Weather['DATE_TIME'] = pd.to_datetime(df_2_Weather['DATE_TIME'], format='%Y-%m-%d %H:%M:%S')


DataFrames in Python, convert date and time strings to datetime objects using the `pd.to_datetime()` function. This operation is particularly useful for handling time-related data.

In [51]:
print(df_1_Generation)

                DATE_TIME       SOURCE_KEY  DC_POWER  AC_POWER  DAILY_YIELD  TOTAL_YIELD
0     2020-05-15 00:00:00  1BY6WEcLGh8j5v7     0.000     0.000        0.000  6259559.000
1     2020-05-15 00:00:00  1IF53ai7Xc0U56Y     0.000     0.000        0.000  6183645.000
2     2020-05-15 00:00:00  3PZuoBAID5Wc2HD     0.000     0.000        0.000  6987759.000
3     2020-05-15 00:00:00  7JYdWkrLSPkdwr4     0.000     0.000        0.000  7602960.000
4     2020-05-15 00:00:00  McdE0feGgRqW7Ca     0.000     0.000        0.000  7158964.000
...                   ...              ...       ...       ...          ...          ...
68773 2020-06-17 23:45:00  uHbuxQJl8lW7ozc     0.000     0.000     5967.000  7287002.000
68774 2020-06-17 23:45:00  wCURE6d3bPkepu2     0.000     0.000     5147.625  7028601.000
68775 2020-06-17 23:45:00  z9Y9gH1T5YWrNuG     0.000     0.000     5819.000  7251204.000
68776 2020-06-17 23:45:00  zBIq5rxdHJRwDNY     0.000     0.000     5817.000  6583369.000
68777 2020-06-17 23:4

In [52]:
date_time_types = {
    'df_1_Generation_DATE_TIME': df_1_Generation['DATE_TIME'].dtype,
    'df_1_Weather_DATE_TIME': df_1_Weather['DATE_TIME'].dtype,
    'df_2_Generation_DATE_TIME': df_2_Generation['DATE_TIME'].dtype,
    'df_2_Weather_DATE_TIME': df_2_Weather['DATE_TIME'].dtype
}

date_time_types

{'df_1_Generation_DATE_TIME': dtype('<M8[ns]'),
 'df_1_Weather_DATE_TIME': dtype('<M8[ns]'),
 'df_2_Generation_DATE_TIME': dtype('<M8[ns]'),
 'df_2_Weather_DATE_TIME': dtype('<M8[ns]')}