<a href="https://colab.research.google.com/github/estebanpv/practice/blob/main/Adaviv.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PLANT HEALTH DATASET ANALYSIS

## Presented by: Esteban Perez
## Company: Adaviv

## Data preparation

In [40]:
# Import libraries

import pandas as pd
import numpy as np
import math
from datetime import datetime

In [46]:
# Import dataset into dataframe

csv_url = 'https://raw.githubusercontent.com/estebanpv/practice/main/Sum_of_Number_Of_Pla_1674857456444_1_1_1.csv'
df_plant_health = pd.read_csv(csv_url,sep=',',header=0, parse_dates=['Date'], index_col=False)

In [49]:
# Find if there are rows with missing data

print("Shape of original dataset: ")
print(df_plant_health.shape)

print("Shape of dataset without missing data: ")
df_plant_health.dropna
print(df_plant_health.shape)

print("If both shapes are the same, it means the dataset has no empty data.")

Shape of original dataset: 
(232652, 8)
Shape of dataset without missing data: 
(232652, 8)
If both shapes are the same, it means the dataset has no empty data.


## Preliminary analysis

In [52]:
df_plant_health.head()

Unnamed: 0,Date,Week,Room,Crop Week,Cultivar,Issue,coloredSeverity,Number of plants (SUM),Week_Of_Observation
0,2023-01-27,4,1,4,LM,Nutrient Def,1.Green,0,63
1,2023-01-27,4,1,4,LM,Nutrient Def,2.Yellow,1,63
2,2023-01-27,4,1,4,LM,Nutrient Def,3.Orange,0,63
3,2023-01-27,4,1,4,LM,Nutrient Def,4.Red,0,63
4,2023-01-27,4,1,4,LM,Runts,1.Green,0,63


In [50]:
# Calculate "Week Of Observation", which is the number of weeks passed since the observation started, for each record.
# This is necessary because Week 4 of 2021 is not the same as Week 4 of 2022.
# Field "Week Of Observation" is useful to discover phenomena that occurred from year to year.
# Field "Week" is still useful because it determines relevant conditions for plant health: humidity, temperature and sunlight.

Date_min = df_plant_health.min()['Date']
Date_max = df_plant_health.max()['Date']

def calc_week_of_observation(x):
  Date_current = x['Date']
  delta = Date_current - Date_min
  Week_Of_Observation = math.floor(delta.days / 7) + 1
  return Week_Of_Observation

df_plant_health['Week_Of_Observation'] = df_plant_health.apply(lambda x: calc_week_of_observation(x), axis=1)

In [53]:
df_plant_health.describe()

Unnamed: 0,Week,Room,Crop Week,Number of plants (SUM),Week_Of_Observation
count,232652.0,232652.0,232652.0,232652.0,232652.0
mean,27.66293,4.72988,5.736035,2.170645,34.223183
std,16.51359,2.330381,3.109653,29.286162,17.236033
min,1.0,1.0,1.0,0.0,1.0
25%,13.0,3.0,3.0,0.0,19.0
50%,27.0,5.0,6.0,0.0,34.0
75%,43.0,7.0,8.0,0.0,50.0
max,53.0,8.0,13.0,1800.0,63.0


# **Initial observations:**



*   Week
  *   Determines relevant conditions for plant health: humidity, temperature and sunlight.
  *   Some years might have 53 weeks.

*   Room
  *   There are 8 rooms.

*   Crop Week
  *   A crop cycle may last between 1 and 13 weeks.

*   Number of plants (SUM)
  *   At least for the 75% of the observations, no plants were identified with issues.
  *   There is an outlying value: 1,800. Transcription error or disaster in the facility?



In [32]:
df_plant_health.dtypes

Date                      datetime64[ns]
Week                               int64
Room                               int64
Crop Week                          int64
Cultivar                          object
Issue                             object
coloredSeverity                   object
Number of plants (SUM)             int64
Week_Of_Observation                int64
dtype: object

In [37]:
df_plant_health.dtypes
df_plant_health.min()['Date']
#df_plant_health.max()['Date']

Timestamp('2021-11-16 00:00:00')

In [31]:
df_plant_health.head()

Unnamed: 0,Date,Week,Room,Crop Week,Cultivar,Issue,coloredSeverity,Number of plants (SUM)
0,2023-01-27,4,1,4,LM,Nutrient Def,1.Green,0
1,2023-01-27,4,1,4,LM,Nutrient Def,2.Yellow,1
2,2023-01-27,4,1,4,LM,Nutrient Def,3.Orange,0
3,2023-01-27,4,1,4,LM,Nutrient Def,4.Red,0
4,2023-01-27,4,1,4,LM,Runts,1.Green,0


In [33]:
df_plant_health.describe()

Unnamed: 0,Week,Room,Crop Week,Number of plants (SUM)
count,232652.0,232652.0,232652.0,232652.0
mean,27.66293,4.72988,5.736035,2.170645
std,16.51359,2.330381,3.109653,29.286162
min,1.0,1.0,1.0,0.0
25%,13.0,3.0,3.0,0.0
50%,27.0,5.0,6.0,0.0
75%,43.0,7.0,8.0,0.0
max,53.0,8.0,13.0,1800.0
