In this notebook, we are going to try finding some facts of solar power plants in India from the dataset. <br>
From this 13 columns of data for 2 solar power plants, we can get surprisingly huge amount of information. <br>
<br>
So let's begin finding some interesting fact. 

In [None]:
import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from matplotlib import cm

In [None]:
path = '../input/solar-power-generation-data'

plant1_generation = pd.read_csv(f'{path}/Plant_1_Generation_Data.csv')
plant1_sensor = pd.read_csv(f'{path}/Plant_1_Weather_Sensor_Data.csv')
plant2_generation = pd.read_csv(f'{path}/Plant_2_Generation_Data.csv')
plant2_sensor = pd.read_csv(f'{path}/Plant_2_Weather_Sensor_Data.csv')

# Data Exploration
## Have take a look
For the starter, take a look at the each data.

In [None]:
plant1_generation.head()

In [None]:
plant2_generation.head()

In [None]:
plant1_sensor.head()

In [None]:
plant2_sensor.head()

In [None]:
plant1_generation.describe()

In [None]:
plant2_generation.describe()

In [None]:
plant1_sensor.describe()

In [None]:
plant2_sensor.describe()

We don't want to see nulls in the dataset.

In [None]:
print(plant1_generation.isna().sum())
print(plant1_sensor.isna().sum())
print(plant2_generation.isna().sum())
print(plant2_sensor.isna().sum())

In [None]:
# Do some feature engineering for the sake of analysis
plant1_generation['DATE_TIME'] = pd.to_datetime(plant1_generation['DATE_TIME'], format='%d-%m-%Y %H:%M') #Caution! The format is completely different. Format must be provided. Or you'll get mistaken data
plant1_sensor['DATE_TIME'] = pd.to_datetime(plant1_sensor['DATE_TIME'])
plant2_generation['DATE_TIME'] = pd.to_datetime(plant2_generation['DATE_TIME'])
plant2_sensor['DATE_TIME'] = pd.to_datetime(plant2_sensor['DATE_TIME'])

plant1_generation['DATE'] = plant1_generation['DATE_TIME'].dt.date
plant1_sensor['DATE'] = plant1_sensor['DATE_TIME'].dt.date
plant2_generation['DATE'] = plant2_generation['DATE_TIME'].dt.date
plant2_sensor['DATE'] = plant2_sensor['DATE_TIME'].dt.date

Okay, so far so good. 

### Obvious findings

1. Datetime in generation data for plant 1 has different form. It have to be taken care of. 
1. No nulls are found. Great.


### Questions that we should find answeres

1. The ratio of DC power and AC power looks different between plant1 to plant2. At plant1, DC is much higher than AC. At plant2, these two are almost the same. We don't know what is going on here for now.
1. As concern as DC power and AC power, 50% of plant 2 data has 0, which means it is not working for some reason. What's the reasons? And why plant 2 yields less DC and AC power?
1. Even plant 2 yields less DC and AC power, plant 2 has far better TOTAL_YIELD even 2 digits. Why?

# Take a closer look...

We should check the timespan of the data for each plants and sensors. If the time frame was different, we have to consider about it.

In [None]:
def print_time_frame(data, label):
    print(f"{label}  -  start:{data['DATE_TIME'].min()}   end:{data['DATE_TIME'].max()}")

print_time_frame(plant1_generation, 'Plant1 ')
print_time_frame(plant2_generation, 'Plant2 ')
print_time_frame(plant1_sensor, 'Sensor1')
print_time_frame(plant2_sensor, 'Sensor2')

## Plants

As we can know from the data description, DAILY_YIELD is a comulative sum of power generated on that day. <br>
If we want to know the conclusive generated power for a day, we have to see the last record of each day. 

In [None]:
daily_yield_for_each_plant = plant1_generation.sort_values(['SOURCE_KEY', 'DATE_TIME', 'DAILY_YIELD']).drop_duplicates(['SOURCE_KEY', 'DATE'], keep='last')
mean_daily_yield1 = daily_yield_for_each_plant['DAILY_YIELD'].mean()

daily_yield_for_each_plant = plant2_generation.sort_values(['SOURCE_KEY', 'DATE_TIME', 'DAILY_YIELD']).drop_duplicates(['SOURCE_KEY', 'DATE'], keep='last')
mean_daily_yield2 = daily_yield_for_each_plant['DAILY_YIELD'].mean()

print(f'Daily yield mean of each plants - plant1: {mean_daily_yield1}  plant2: {mean_daily_yield2}')

Let's see the ranking of the mean of DAILY_YIELD, the best performing inverter for each plants.

In [None]:
def show_ranking_of_inverter_daily_yield_mean(data, label=''):
    data = data.sort_values(['SOURCE_KEY', 'DATE_TIME', 'DAILY_YIELD']).drop_duplicates(['SOURCE_KEY', 'DATE'], keep='last')

    sources = pd.DataFrame()

    for source_key in data['SOURCE_KEY'].unique():
        daily_yield_mean = data[data['SOURCE_KEY'] == source_key]['DAILY_YIELD'].mean()
        sources = sources.append({'SOURCE_KEY':source_key, 'DAILY_YIELD_MEAN': daily_yield_mean}, ignore_index=True)
        
    print(f' {label} - Ranking of daily yield for each sources')
    print(sources.sort_values(['DAILY_YIELD_MEAN'], ascending=False).reset_index(drop=True))
    
    return sources

In [None]:
daily_yield_mean1 = show_ranking_of_inverter_daily_yield_mean(plant1_generation, 'Plant1')
print('')
daily_yield_mean2 = show_ranking_of_inverter_daily_yield_mean(plant2_generation, 'Plant2')

As TOTAL_YIELD is not starting from zero, we can assume that TOTAL_YIELD has been accumulated since the inverters are installed. <br>

In [None]:
beginning_of_total_yield_plant1 = plant1_generation.sort_values(['SOURCE_KEY', 'DATE_TIME']).drop_duplicates('SOURCE_KEY', keep='first')[['SOURCE_KEY', 'TOTAL_YIELD']].sort_values('TOTAL_YIELD', ascending=False)
beginning_of_total_yield_plant2 = plant2_generation.sort_values(['SOURCE_KEY', 'DATE_TIME']).drop_duplicates('SOURCE_KEY', keep='first')[['SOURCE_KEY', 'TOTAL_YIELD']].sort_values('TOTAL_YIELD', ascending=False)

print('-- The initial values of TOTAL_YIELD at plant 1')
print(beginning_of_total_yield_plant1)

print('')
print('-- The initial values of TOTAL_YIELD at plant 2')
print(beginning_of_total_yield_plant2)

We can see the huge gap between inverters.<br>
To set the start line, we will subtract the result from TOTAL_YIELD.<br>
Now TOTAL_YIELD means, cumulative sum of power in this time frame.

In [None]:
plant1_generation = plant1_generation.merge(beginning_of_total_yield_plant1, on='SOURCE_KEY', how='left', suffixes=['', '_init'])
plant1_generation['TOTAL_YIELD'] -= plant1_generation['TOTAL_YIELD_init']

plant2_generation = plant2_generation.merge(beginning_of_total_yield_plant2, on='SOURCE_KEY', how='left', suffixes=['', '_init'])
plant2_generation['TOTAL_YIELD'] -= plant2_generation['TOTAL_YIELD_init']

In [None]:
def show_ranking_of_inverter_total_yield(data, label=''):
    pd.options.display.float_format = '{:.0f}'.format
    tmp = data.groupby(['SOURCE_KEY']).agg({'TOTAL_YIELD':'max'}).reset_index().sort_values(['TOTAL_YIELD'], ascending=False).reset_index(drop=True)
    print(tmp)
    
    return tmp    

Now, we can rank TOTAL_YIELD in the time frame we got. 

In [None]:
def show_ranking_total_yield():
    print('ranking of TOTAL_YIELD - plant 1')
    total_yield1 = show_ranking_of_inverter_total_yield(plant1_generation)
    print('')
    print('ranking of TOTAL_YIELD - plant 2')
    total_yield2 = show_ranking_of_inverter_total_yield(plant2_generation)
    
    return total_yield1, total_yield2
    
total_yield1, total_yield2 = show_ranking_total_yield()

Let's combine the results into a table.

In [None]:
tmp = daily_yield_mean1.merge(total_yield1, on='SOURCE_KEY', how='left')
tmp = tmp.merge(beginning_of_total_yield_plant1, on='SOURCE_KEY', how='left', suffixes=['_init', '_all']).sort_values('TOTAL_YIELD_init', ascending=False).reset_index(drop=True)
print(tmp)

print('')
tmp = daily_yield_mean2.merge(total_yield2, on='SOURCE_KEY', how='left')
tmp = tmp.merge(beginning_of_total_yield_plant2, on='SOURCE_KEY', how='left', suffixes=['_init', '_all']).sort_values('TOTAL_YIELD_init', ascending=False).reset_index(drop=True)
print(tmp)

## Sensors

For safety, let's confirm that there are only one sensors installed at each plants.

In [None]:
print(f"Numbers of sensors at plant1 : {len(plant1_sensor['SOURCE_KEY'].unique())}")
print(f"Numbers of sensors at plant2 : {len(plant2_sensor['SOURCE_KEY'].unique())}")

We should see the relation of plant data and sensor data as well, but it should be easier to see it using graphs. We can analyze it a bit later.

We've got some other findings and questions.

### Findings

1. Time frame is fixed. It is safe that using DATE_TIME as axis without special care. 
1. Each plants has only 1 sensor. Because each sensors has only 1 SOURCE_KEY. 
1. We have same number of inverters for each plants. Which is 22.
1. Plant 2 has significantly large amount of TOTAL_YIELD, we can assume that plant 2 has established far earlier than plant 1. Also at plant 2, we can see huge gap between the top inverter and the bottom inverter. It seems some inverters running long, and some are installed lately. On the other hand, at plant 1, most inverters yield similar amount of TOTAL_YIELD. It seems that most of or all of the inverters are installed at the same time. 
1. Seeing the TOTAL_YIELD ranking, almost all the inverters perform similar at plant 1. Contrary at plant 2, some inverters look underperform significantly.

### Questions that we should find answers

1. When we see the inverter ranking, it is clear that there are huge gap in TOTAL_YIELD_init between the best one and the bottom one. Considring TOTAL_YIELD_all, these underperforming inverters are not always old. So what's the reasons?

## Let's visualize them
As we found some facts and many questions, we can visualize the data so we might have good ideas to answere these questions.

### Q:Huge difference of DC power at plant 1 between plant 2
Let's try answering the next question.

*The ratio of DC power and AC power looks different between plant1 to plant2. At plant1, DC is much higher than AC. At plant2, these two are almost the same.*

To do this, we're visualizing the relationship of DC and AC for each plants. 
We're plotting DC and AC in the same frame, grouping all the data with TIME. 

In [None]:
def show_cum_dc_ac(data, ax, label):
    data = data.groupby('DATE_TIME').sum().reset_index()
    data['TIME'] = data['DATE_TIME'].dt.time
    data.plot(x='TIME', y=['AC_POWER', 'DC_POWER'], style='o', ax=ax)
    ax.set_title(label)

In [None]:
fig, ax = plt.subplots(ncols=3, nrows=1, figsize=(24,6))
show_cum_dc_ac(plant1_generation, ax[0], 'Fig1. AC vs DC at Plant 1')
show_cum_dc_ac(plant2_generation, ax[1], 'Fig2. AC vs DC at Plant 2')

tmp = plant1_generation.copy()
tmp['DC_POWER'] /= 10
show_cum_dc_ac(tmp, ax[2], 'Fig3. AC vs DC at Plant 1, DC divided by 10')

Now we want to see the DC to AC conversion rate of each plants.

In [None]:
def conv_rate(data):
    tmp = data.sum()
    return round(tmp['AC_POWER'] / tmp['DC_POWER'] * 100, 2)

In [None]:
print("Conversion rate of DC to AC")
print(f"{conv_rate(plant1_generation)}% : Plant1")
print(f"{conv_rate(plant2_generation)}% : Plant2")

tmp = plant1_generation.copy()
tmp['DC_POWER'] /= 10
print(f"{conv_rate(tmp)}% : Plant1 DC divided by 10")

##### The answer for this question

Now it's clear that at plant 1, AC and DC has huge gap. <br>
Interesting thing is, if we divide DC by 10 at plant 1, now it looks pretty similar to plant 2.<br>

I think we can make assumptions why there's this huge gap.
1. If we believe the description of DC_POWER and AC_POWER on [Data](https://www.kaggle.com/anikannal/solar-power-generation-data) page, the units of these values are kW. Which means inverters at plant 1 is super ineffective. The efficiency is only 9.78%. 
1. If we can have doubt in the description, the data itself can be faulty. 

I take the second assumption. The reasons are follows.
1. It is unlikely that the conversion rate is this low. Usually the conversion rate should be over 80%. 
1. If 90% of the power was lost, where did the power go? If there's no facilities using DC around the area, it just goes to heat. If it was right, the solar panels can be seriously damaged. 

According to the assumption, we are going to take the DC power of plant 1 as divided by 10.

In [None]:
plant1_generation['DC_POWER'] /= 10

### Q:It looks plant 2 is working less. 
Letâ€™s get to the next one.

*As concern as DC power and AC power, 50% of plant 2 has 0, which means it is not working for some reason. What's the reasons? And why plant 2 yields less DC and AC power?*

To make the observation clearer, let's calculate the percentage of zeros. 

In [None]:
def rate(data, target):
    return len(data[data[target]==0]) / len(data) * 100

print('The percentage of zeros of DC, AC power')
print(f"{rate(plant1_generation, 'DC_POWER')}%: DC POWER at plant1")
print(f"{rate(plant2_generation, 'DC_POWER')}%: DC POWER at plant2")
print(f"{rate(plant1_generation, 'AC_POWER')}%: AC POWER at plant1")
print(f"{rate(plant2_generation, 'AC_POWER')}%: AC POWER at plant2")

Interestingly, the percentage of DC vs AC are completely the same. <br>
Now I get a hypothesis, is it happening because of the weather? <br>
As of mentioned in [this discussion](https://www.kaggle.com/anikannal/solar-power-generation-data/discussion/185351), these two plants are located far, over 500km. 
Let's visualize the some sensor data.

If there's no irradiation, the power of DC and AC might be zero as well, right? So let's see the percentage of zeros of irradiation.

In [None]:
print('The percentage of zeros of irradiation')
print(f"{rate(plant1_sensor, 'IRRADIATION')}%: Plant 1")
print(f"{rate(plant2_sensor, 'IRRADIATION')}%: Plant 2")


Then, it's not the case. Plant 2 has less irradiation. We don't see any correlation.

How about taking a look at DC, AC power of each inverters in each day? If some inverter at plant 1 didn't work, it can happen. <br>
Let's make some graphs.<br>

*I show only DC power transition. Because AC shows pretty similar transition with DC.*

In [None]:
chars = ['--', '-.', ':', '-o', '-v', '-s']
def show_power_transition(data, fr, target, ax, step=6):
    data = data.groupby(['SOURCE_KEY', 'DATE']).agg({'DC_POWER':'sum', 'AC_POWER':'sum'}).reset_index()
    for i, source_key in enumerate(data['SOURCE_KEY'].unique()[fr:fr+step]):
        tmp = data[data['SOURCE_KEY'] == source_key]
        ax.plot(tmp['DATE'], tmp[target], chars[i], label=source_key)
    ax.legend()

In [None]:
fig, ax = plt.subplots(ncols=1, nrows=4, figsize=(32,20))

data = plant1_generation
label = 'DC_POWER'
show_power_transition(data, 0, label, ax[0])
show_power_transition(data, 6, label, ax[1])
show_power_transition(data, 12, label, ax[2])
show_power_transition(data, 18, label, ax[3])

In [None]:
fig, ax = plt.subplots(ncols=1, nrows=4, figsize=(32,20))

data = plant2_generation
label = 'DC_POWER'
show_power_transition(data, 0, label, ax[0])
show_power_transition(data, 6, label, ax[1])
show_power_transition(data, 12, label, ax[2])
show_power_transition(data, 18, label, ax[3])

I feel that there are some similar patterns between inverters. Let's classify them manually.

In [None]:
fig, ax = plt.subplots(ncols=1, nrows=5, figsize=(32,30))

data = plant2_generation
label = 'DC_POWER'


tmp = data[data['SOURCE_KEY'].isin(['PeE6FRyGXUgsRhN','V94E5Ben1TlhnDV','oZZkBaNadn6DNKz','xoJJ8DcxJEcupym'])]
show_power_transition(tmp, 0, 'DC_POWER', ax[0])

tmp = data[data['SOURCE_KEY'].isin(['Et9kgGMDl729KT4','LYwnQax7tkwH5Cb', 'Quc1TzYxW2pYoWX','rrq4fwE8jgrTyWY','q49J1IKaHRwDQnt'])]
show_power_transition(tmp, 0, 'DC_POWER', ax[1])

tmp = data[data['SOURCE_KEY'].isin(['81aHJ1q11NBPMrL','9kRcWv60rDACzjR','LlT2YUhhzqhg5Sw','WcxssY2VbP4hApt','vOuJvMaM2sgwLmb'])]
show_power_transition(tmp, 0, 'DC_POWER', ax[2])

tmp = data[data['SOURCE_KEY'].isin(['4UPUqMRk7TRMgml','Mx2yZCDsyf6DPfv', 'Qf4GUc1pJu5T6c6','oZ35aAeoifZaQzV'])]
show_power_transition(tmp, 0, 'DC_POWER', ax[3])

tmp = data[data['SOURCE_KEY'].isin(['IQ2d7wF4YD8zU1Q','NgDl19wMapZy17u','mqwcsP2rE7J0TFp', 'xMbIugepa2P7lBB'])]
show_power_transition(tmp, 0, 'DC_POWER', ax[4])

Some of them are pretty similar! Maybe they are located at the same place. 

In [None]:
all_inverters = set(plant2_generation['SOURCE_KEY'].unique())
tmp = plant2_generation[(datetime.date(2020, 5, 21) <= plant2_generation['DATE']) & (plant2_generation['DATE'] <= datetime.date(2020, 5, 28))]

print("The following inverters don't have any data between May 21 to 28")
print(all_inverters - set(tmp['SOURCE_KEY'].unique()))

Now we got some interesting findings.

1. At plant 1, there are 2 faulty inverters found. These ids are 'lBY6WEcLGh8j5v7' and 'bvBOhCH3iADSZry'.
1. At plant 2, there are 8 days of missing data in 4 inverters. These ids are 'IQ2d7wF4YD8zU1Q', 'mqwcsP2rE7J0TFp', 'xMbIugepa2P7lBB' and 'NgDl19wMapZy17u'.

So at plant 2, some data is missing. Is that's why plant 2 has more zeros ratio? Let's see it from different perspective.

In [None]:
tmp1 = plant1_generation[~((datetime.date(2020, 5, 20) <= plant1_generation['DATE']) & (plant1_generation['DATE'] <= datetime.date(2020, 5, 29)))]
tmp2 = plant2_generation[~((datetime.date(2020, 5, 20) <= plant2_generation['DATE']) & (plant2_generation['DATE'] <= datetime.date(2020, 5, 29)))]

print('The percentage of zeros of DC, AC power')
print(f"{rate(tmp1, 'DC_POWER')}%: DC POWER at plant1")
print(f"{rate(tmp2, 'DC_POWER')}%: DC POWER at plant2")
print(f"{rate(tmp1, 'AC_POWER')}%: AC POWER at plant1")
print(f"{rate(tmp2, 'AC_POWER')}%: AC POWER at plant2")

##### The answer for this question

**It remains mistery for now... let's get back here later.**

## We've got new questions

1. At plant 1, why these 2 inverters underperforming? Especially June 14.
1. At plant 2, why there are 4 inverters which don't have any data between 21 to 28? (Is it even possible to find some answer?

Let's try answering them later.

### Q:Plant 2 looks far too good in yielding power considering its DC and AC power
*Even plant 2 yields less DC and AC power, plant 2 has far better TOTAL_YIELD even 2 digits. Why?*

We already know the answer for this.<br>
TOTAL_YIELD is cumulative sum of the power since the beginning.<br>
Hence TOTAL_YIELD for each inverters are similar now.<br>
We see there are some outliers like "Quc1TzYxW2pYoWX", we are going to investigate them later.

In [None]:
_ = show_ranking_total_yield()

## We've got another question here

1. The best 4 performing inverters at plant 2, these 4 inverters completely matchs with the converters which don't have 8 days of data. These are 'mqwcsP2rE7J0TFp', 'IQ2d7wF4YD8zU1Q', 'NgDl19wMapZy17u', 'xMbIugepa2P7lBB'. Is this just a coincidence?

### Q:Looks like there are some underperfoming inverters. Why?
Okay, now it's time to go to the next one.

*When we see the inverter ranking, it is clear that there are huge gap in TOTAL_YIELD_init between the best one and the bottom one. Considring TOTAL_YIELD_all, these underperforming inverters are not always old. So what's the reasons?*


Let's visualize the transition of DC yields.

In [None]:
fig, ax = plt.subplots(ncols=1, nrows=6, figsize=(100,22))

def show_comparison(data, data2, inverter_id, label, ax):
    tmp = data[data['SOURCE_KEY']==inverter_id].merge(data2, on='DATE_TIME', how='left')
    ax.plot(tmp['DATE_TIME'], tmp['DC_POWER'], label=label)
    ax.legend()

show_comparison(plant2_generation, plant2_sensor, 'Mx2yZCDsyf6DPfv', 'Mx2yZCDsyf6DPfv (Better', ax[0])
show_comparison(plant2_generation, plant2_sensor, 'Quc1TzYxW2pYoWX', 'Quc1TzYxW2pYoWX (Worst1', ax[0])

show_comparison(plant2_generation, plant2_sensor, 'Mx2yZCDsyf6DPfv', 'Mx2yZCDsyf6DPfv (Better', ax[1])
show_comparison(plant2_generation, plant2_sensor, 'Et9kgGMDl729KT4', 'Et9kgGMDl729KT4 (Worst2', ax[1])

show_comparison(plant2_generation, plant2_sensor, 'Mx2yZCDsyf6DPfv', 'Mx2yZCDsyf6DPfv (Better', ax[2])
show_comparison(plant2_generation, plant2_sensor, 'LYwnQax7tkwH5Cb', 'LYwnQax7tkwH5Cb (Worst3', ax[2])

show_comparison(plant2_generation, plant2_sensor, 'Mx2yZCDsyf6DPfv', 'Mx2yZCDsyf6DPfv (Better', ax[3])
show_comparison(plant2_generation, plant2_sensor, 'rrq4fwE8jgrTyWY', 'rrq4fwE8jgrTyWY (Worst3', ax[3])

show_comparison(plant2_generation, plant2_sensor, 'Mx2yZCDsyf6DPfv', 'Mx2yZCDsyf6DPfv (Better', ax[4])
show_comparison(plant2_generation, plant2_sensor, 'xoJJ8DcxJEcupym', 'xoJJ8DcxJEcupym (Worst4', ax[4])

show_comparison(plant2_generation, plant2_sensor, 'Mx2yZCDsyf6DPfv', 'Mx2yZCDsyf6DPfv (Better', ax[5])
show_comparison(plant2_generation, plant2_sensor, 'LlT2YUhhzqhg5Sw', 'LlT2YUhhzqhg5Sw (Worst5', ax[5])

We can see that they don't yield any power in the middle of days time to time!<br>
And we found some data is missing as well. <br>
So why it is happening? Let's dig it deeper.

In [None]:
def show_iverter_to_environment(data, inverter_id, target, ax1, ax2):
    tmp = data[data['SOURCE_KEY']==inverter_id].merge(plant2_sensor, on='DATE_TIME', how='left')

    c1 = cm.Set1.colors[1]
    c2 = cm.Set1.colors[0]
    
    ax1.plot(tmp['DATE_TIME'], tmp['DC_POWER'], label='DC_POWER', color=c1)
    ax2.plot(tmp['DATE_TIME'], tmp[target], label=target, color=c2)

    ax1.tick_params(axis='y', colors=c1)
    ax2.tick_params(axis='y', colors=c2)
    
    handler1, label1 = ax1.get_legend_handles_labels()
    handler2, label2 = ax2.get_legend_handles_labels()
    ax1.legend(handler1 + handler2, label1 + label2, loc=2, borderaxespad=0.)

fig, ax1 = plt.subplots(ncols=1, nrows=3, figsize=(100,12))

ax2 = ax1[0].twinx()
show_iverter_to_environment(plant2_generation, 'Quc1TzYxW2pYoWX', 'MODULE_TEMPERATURE', ax1[0], ax2)
ax2 = ax1[1].twinx()
show_iverter_to_environment(plant2_generation, 'Quc1TzYxW2pYoWX', 'IRRADIATION', ax1[1], ax2)
ax2 = ax1[2].twinx()
show_iverter_to_environment(plant2_generation, 'Quc1TzYxW2pYoWX', 'AMBIENT_TEMPERATURE', ax1[2], ax2)

Even there's irradiation, as we assumed, it stopped working in the middle of days. <br>
It usually happens when MODULE_TEMPERATURE is high but not always. <br>
I think we need domain knowledge to dig this part.<br>

##### The answer for this question
These underperforming inverters stop working in the middle of days time to time.<br>
It can be because of module temperature but we are not sure of it.

### Q:There are 2 underperforming inverters at plant 1

*At plant 1, why these 2 inverters underperforming? Especially June 14.*

Let's make some graphs.

In [None]:
fig, ax = plt.subplots(ncols=1, nrows=2, figsize=(100,8))

show_comparison(plant1_generation, plant1_sensor, 'adLQvlD726eNBSB', 'adLQvlD726eNBSB (Best', ax[0])
show_comparison(plant1_generation, plant1_sensor, 'bvBOhCH3iADSZry', 'bvBOhCH3iADSZry (Worst1', ax[0])

show_comparison(plant1_generation, plant1_sensor, 'adLQvlD726eNBSB', 'adLQvlD726eNBSB (Best', ax[1])
show_comparison(plant1_generation, plant1_sensor, '1BY6WEcLGh8j5v7', '1BY6WEcLGh8j5v7 (Worst2', ax[1])

So basically the inverter performs well as the best performing one.<br>
But on June 7 and June 14, we can see it stopped working for a while in the middle of the day.<br>
And it happened to both of them at the same time! It's weird. There must be some reason. <br>

In [None]:
fig, ax1 = plt.subplots(ncols=1, nrows=3, figsize=(100,12))

id = '1BY6WEcLGh8j5v7'

ax2 = ax1[0].twinx()
show_iverter_to_environment(plant1_generation, id, 'MODULE_TEMPERATURE', ax1[0], ax2)
ax2 = ax1[1].twinx()
show_iverter_to_environment(plant1_generation, id, 'IRRADIATION', ax1[1], ax2)
ax2 = ax1[2].twinx()
show_iverter_to_environment(plant1_generation, id, 'AMBIENT_TEMPERATURE', ax1[2], ax2)

##### The answer for this question

We don't know the exact reason but these two inverters stop working during the day time.<br>
Basically these two inverters perform well as others, but because of the halt, they yield less power. 

## to be continued...

I'm gonna add a bit more.