# INTRODUCTION

**Before Entering Into Notebook ,I am going to give brief introduction about Data used,**

**What is this?:** Data representing the weather conditions on Mars from Sol 1 (August 7, 2012 on Earth) to Sol 1895 (February 27, 2018 on Earth).

**Source(s) & Methodology:** This data was measured and transmitted via the Rover Environmental Monitoring Station (REMS) on-board the Curiosity Rover. The data was made publicly available by NASA’s Mars Science Laboratory and the Centro de Astrobiología (CSIC-INTA). The Centro de Astrobiología offers a widget and a disclaimer regarding the data collected by Curiosity here. 


**Attributes Description**

<br/>•id - The identification number of a single transmission

<br/>•terrestrial_date - The date on Earth (formatted as month/day/year or m/dd/yy).

<br/>•ls - The solar longitude or the Mars-Sun angle, measured from the Northern Hemisphere. In the Northern Hemisphere, the spring equinox is when ls = 0. Since Curiosity is in the Southern Hemisphere, the following ls values are of importance: <br/>• ls = 0: autumnal equinox <br/>• ls = 90 : winter solstice <br/>• ls = 180 : spring equinox <br/>• ls = 270 : summer solstice

<br/>•month -	The Martian Month. Similarly to Earth, Martian time can be divided into 12 months.

<br/>•min_temp -	The minimum temperature (in °C) observed during a single Martian sol.

<br/>•max_temp -	The maximum temperature (in °C) observed during a single Martian sol.	

<br/>•pressure -	The atmospheric pressure (Pa) in Curiosity's location on Mars.

<br/>•wind_speed - The average wind speed (m/s) measured in a single sol. Note: Wind Speed data has not be transmitted to Earth since Sol 1485. Missing values are coded as NaN.	

<br/>•atmo_opacity - Description of the overall weather conditions on Mars for a given sol based on atmospheric opacity (e.g., Sunny).

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)



import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))



In [None]:
# lets start by reading the csv file and then we will take a peek into it data

data=pd.read_csv('/kaggle/input/mars-weather-data/mars-weather.csv')

data.sample(5)

In [None]:
data.describe()

In [None]:
data.info()

In [None]:
# as we can see some of the columns have lot's of missing values i.e precipm( probably windspeed )

# let's look at the non numeric columns and see what we have 

obj_cols=data.select_dtypes(include='object')

obj_cols.sample(5)

In [None]:
data.head()

In [None]:
# looks as if the terrestrial columns is present in the stirng format let's convert it to datetime 

data['terrestrial_date']=pd.to_datetime(data['terrestrial_date'])

data.terrestrial_date.sample(5)

**In this notebook is for learning univariate predicitons so for that reason we will try to predict temprature**

**let's take a deeper look at the temperature feature**

In [None]:
# in this notebook is for learning univariate predicitons so for that reason we will try to predict temprature

# let's take a deepr look at the temperature feature

data.columns

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize=(20,10))
data.iloc[:,6].value_counts(normalize=True,sort=True).plot(kind='bar')
plt.xlabel('Max Temperature in degree Celsuis')
plt.ylabel('% percentage')
plt.show()

In [None]:
# So we can see that most of the time the temperature lies in the range (-35-11 )

#'lets check if we have any missing values in our target column'

data.iloc[:,6].isna().sum()

# As we can see that there are 27 rows which have missing values for temperature feature

In [None]:
import datetime as dt
# let's impute the misisng values by the mean of the month

data['month']=pd.to_datetime(data.terrestrial_date).dt.to_period('M')

data.month.value_counts()

In [None]:
aa=data.iloc[:,6].name
ac=data.groupby(by='month')[aa].mean()
ab=data['max_temp'].isna()
data.loc[ab,['max_temp']]=data.loc[ab,'month'].apply(lambda x:ac[x])
data['max_temp'].isna().sum()

In [None]:
new_df=pd.DataFrame(list(data['max_temp']),index=data.terrestrial_date,columns=['Maxtemperature'])

In [None]:
new_df=new_df.resample('D').mean()

In [None]:
new_df

In [None]:
month_df=new_df.resample('M').mean()
month_df

In [None]:
year_df=new_df.resample('Y').mean()
year_df

**As we can see the average temperature has been increasing over the years**

In [None]:
plt.figure(figsize=(20,10))
plt.plot(new_df)
plt.show()

In [None]:
from keras.layers import Bidirectional,LSTM,Dense,Flatten,Conv1D,MaxPooling1D,Dropout,RepeatVector
from keras.models import Sequential
from keras.callbacks import EarlyStopping,ReduceLROnPlateau

early_stop=EarlyStopping(monitor='loss',patience=5)

In [None]:
model=Sequential([Conv1D(100,kernel_size=3,input_shape=(30,1),activation='relu'),
                  Conv1D(100,kernel_size=3),
                  Conv1D(100,kernel_size=3),
                  MaxPooling1D(2),
                 Flatten(),
                  RepeatVector(30),
                 LSTM(128,activation='relu',return_sequences=True),
                 LSTM(128,activation='relu',return_sequences=True),
                  Bidirectional(LSTM(64,activation='relu')),
                 Dense(128,activation='relu'),
                 Dense(1)])

model.compile(optimizer='adam',loss='mse',metrics=['accuracy'])

In [None]:
new_df1=pd.DataFrame(list(data['max_temp']), index=data['terrestrial_date'], columns=['temp'])

In [None]:
new_df1

In [None]:
new_df1=new_df1.resample('D').mean()
new_df1.temp.isna().sum()

In [None]:
new_df1.fillna(data['max_temp'].mean(),inplace=True)
new_df1.temp.isna().sum()

In [None]:
from sklearn.preprocessing import MinMaxScaler

scaler=MinMaxScaler(feature_range=(-1,1))

In [None]:
scaled_data=scaler.fit_transform(new_df1)
scaled_data[:5]

In [None]:
steps=30
inp1=[]
out1=[]

for i in range(len(scaled_data)-steps):
    inp1.append(scaled_data[i:i+steps])
    out1.append(scaled_data[i+steps])

In [None]:
inp1=np.asanyarray(inp1)
out1=np.asanyarray(out1)
x_train1=inp1[:500,:,:]
x_test1=inp1[500:,:,:]
y_train1=out1[:500]
y_test1=out1[500:]

In [None]:
model.fit(x_train1,y_train1,epochs=20)

In [None]:
predicted=model.predict(x_test1)

In [None]:
predicted1=scaler.inverse_transform(predicted)

In [None]:
y_test2=scaler.inverse_transform(y_test1)

In [None]:
plt.figure(figsize=(20,5))
plt.plot(predicted1,'r',label='predicted')
plt.plot(y_test2,'g',label='actual')
plt.legend()
plt.show()

In [None]:
AT THE END PREDICTED AND ACTUAL HAVE NEARLY EQUAL