<a href="https://colab.research.google.com/github/kashish1203/minecrafters/blob/Riya/Copy_of_DM_CP_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Data Source:** The data is collected from analogue and digital sensors installed on the APU(Air Processing Unit) of a metro train's compressor. These sensors monitor different aspects of the compressor's operation.

**Data Type:** The dataset contains multivariate time series data, which means that readings are taken over time intervals, and multiple variables are recorded simultaneously.

**Sensors:** The dataset includes readings from the following sensors:

- **Pressure Sensor:**Monitors pressure levels within the APU.
- **Temperature Sensor:** Measures the temperature of the APU.
- **Motor Current Sensor:** Records the electrical current consumed by the compressor's motor.
- **Air Intake Valve Sensor:** Monitors the status or position of the air intake valve.

**Predictive Maintenance:** By analyzing the patterns and trends in sensor readings over time, it's possible to identify potential anomalies, degradation, or malfunctions in the compressor's APU. This information can help predict when maintenance or repairs might be needed, thereby optimizing maintenance schedules and preventing unexpected breakdowns.

**What is APU?**\
-- An APU, or Air Processing Unit, in the context of a metro train's compressor, refers to a component that plays a crucial role in providing clean and conditioned air for various systems within the train. The APU is responsible for filtering, cooling, and sometimes heating the air before it is distributed to different parts of the train, ensuring a comfortable and safe environment for passengers and crew.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import pandas as pd

In [None]:
df = pd.read_csv('/content/drive/MyDrive/MetroPT3(AirCompressor).csv')

In [None]:
df

Unnamed: 0.1,Unnamed: 0,timestamp,TP2,TP3,H1,DV_pressure,Reservoirs,Oil_temperature,Motor_current,COMP,DV_eletric,Towers,MPG,LPS,Pressure_switch,Oil_level,Caudal_impulses
0,0,2020-02-01 00:00:00,-0.012,9.358,9.340,-0.024,9.358,53.600,0.0400,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0
1,10,2020-02-01 00:00:10,-0.014,9.348,9.332,-0.022,9.348,53.675,0.0400,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0
2,20,2020-02-01 00:00:19,-0.012,9.338,9.322,-0.022,9.338,53.600,0.0425,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0
3,30,2020-02-01 00:00:29,-0.012,9.328,9.312,-0.022,9.328,53.425,0.0400,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0
4,40,2020-02-01 00:00:39,-0.012,9.318,9.302,-0.022,9.318,53.475,0.0400,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1516943,15169430,2020-09-01 03:59:10,-0.014,8.918,8.906,-0.022,8.918,59.675,0.0425,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0
1516944,15169440,2020-09-01 03:59:20,-0.014,8.904,8.888,-0.020,8.904,59.600,0.0450,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0
1516945,15169450,2020-09-01 03:59:30,-0.014,8.890,8.876,-0.022,8.892,59.600,0.0425,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0
1516946,15169460,2020-09-01 03:59:40,-0.012,8.876,8.864,-0.022,8.878,59.550,0.0450,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0


In [None]:
df.shape

(1516948, 17)

In [None]:
df.columns

Index(['Unnamed: 0', 'timestamp', 'TP2', 'TP3', 'H1', 'DV_pressure',
       'Reservoirs', 'Oil_temperature', 'Motor_current', 'COMP', 'DV_eletric',
       'Towers', 'MPG', 'LPS', 'Pressure_switch', 'Oil_level',
       'Caudal_impulses'],
      dtype='object')

**Description of Attributes:**\
Attributes in the dataset:
1. **Unnamed: 0:** An unnamed index or identifier for each record in the dataset.
2. **timestamp:** The timestamp indicating the time at which the readings were recorded.
3. **TP2:** Reading from the Pressure sensor, TP2 measures the pressure on the compressor.
4. **TP3:** Reading from the Pressure sensor, TP3 measure the pressure generated at the pneumatic panel.
5. **H1:** Reading from the Pressure sensor, H1 measure the pressure generated due to pressure drop when the discharge of the cyclonic
separator filter occurs.
6. **DV_pressure:** Reading from the Pressure sensor, which measure the pressure drop generated when the towers discharge air dryers; a zero
reading indicates that the compressor is operating under load.
7. **Reservoirs:** Reading related to reservoirs which has the measure of the downstream pressure of the reservoirs, which should be close to the
pneumatic panel pressure (TP3).
8. **Oil_temperature:** Reading of oil temperature on the compressor.
9. **Motor_current:** Reading of motor current which has the measure of the current of one phase of the three-phase motor;\
it presents values close to
  - 0A - when it turns off,
  - 4A - when working offloaded,
  - 7A - when working under load and
  - 9A - when it starts working.
10. **COMP:** Reading related to the electrical signal of the air intake valve on the compressor.
  - it is active when there is no air intake,
indicating that the compressor is either turned off or operating in an offloaded state.
11. **DV_eletric:** Reading related to electrical signal that controls the compressor outlet valve.
  - it is active when the compressor is functioning under load
  - inactive when the compressor is either off or operating in an offloaded state.
12. **Towers:** Reading related to the electrical signal that defines the tower responsible for drying the air and the tower responsible
for draining the humidity removed from the air.
  - when not active, it indicates that tower one is functioning
  - when active, it indicates that tower two is in operation.
13. **MPG:** Reading related to MPG (miles per gallon).\
It measures the electrical signal responsible for starting the compressor under load by activating the intake valve
when the pressure in the air production unit (APU) falls below 8.2 bar;\
It activates the COMP sensor, which assumes
the same behaviour as the MPG sensor.
14. **LPS:** Reading of LPS (low pressure system) which measures the electrical signal that detects and activates when the pressure drops below 7 bars.
15. **Pressure_switch:** Reading from the pressure switch which measures the electrical signal that detects the discharge in the air-drying towers.
16. **Oil_level:** It measures the electrical signal that detects the oil level on the compressor\
It is active when the oil is below the
expected values.
17. **Caudal_impulses:** the electrical signal that counts the pulse outputs generated by the absolute amount of air
flowing from the APU to the reservoirs.




Basic Analysis Steps:

1. **Loading Data:** Load the dataset into a Pandas DataFrame.
2. **Data Exploration:** Explore the first few rows of the dataset using `head()` to understand its structure.
3. **Summary Statistics:** Use `describe()` to get summary statistics for numeric attributes.
4. **Data Types:** Use `info()` to see the data types of attributes and check for missing values.
5. **Data Distribution:** Plot histograms or box plots for numeric attributes to understand their distributions.
6. **Time Series Analysis:** Convert the 'timestamp' column to a datetime data type and explore temporal patterns.


In [None]:
df.describe()

Unnamed: 0.1,Unnamed: 0,TP2,TP3,H1,DV_pressure,Reservoirs,Oil_temperature,Motor_current,COMP,DV_eletric,Towers,MPG,LPS,Pressure_switch,Oil_level,Caudal_impulses
count,1516948.0,1516948.0,1516948.0,1516948.0,1516948.0,1516948.0,1516948.0,1516948.0,1516948.0,1516948.0,1516948.0,1516948.0,1516948.0,1516948.0,1516948.0,1516948.0
mean,7584735.0,1.367826,8.984611,7.568155,0.05595619,8.985233,62.64418,2.050171,0.8369568,0.1606106,0.9198483,0.832664,0.003420025,0.9914368,0.9041556,0.9371066
std,4379053.0,3.25093,0.6390951,3.3332,0.3824015,0.638307,6.516261,2.302053,0.3694052,0.3671716,0.271528,0.3732757,0.05838091,0.09214078,0.2943779,0.2427712
min,0.0,-0.032,0.73,-0.036,-0.032,0.712,15.4,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,3792368.0,-0.014,8.492,8.254,-0.022,8.494,57.775,0.04,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0
50%,7584735.0,-0.012,8.96,8.784,-0.02,8.96,62.7,0.045,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0
75%,11377100.0,-0.01,9.492,9.374,-0.018,9.492,67.25,3.8075,1.0,0.0,1.0,1.0,0.0,1.0,1.0,1.0
max,15169470.0,10.676,10.302,10.288,9.844,10.3,89.05,9.295,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [None]:
# Checking null values
df.isnull().sum()

Unnamed: 0         0
timestamp          0
TP2                0
TP3                0
H1                 0
DV_pressure        0
Reservoirs         0
Oil_temperature    0
Motor_current      0
COMP               0
DV_eletric         0
Towers             0
MPG                0
LPS                0
Pressure_switch    0
Oil_level          0
Caudal_impulses    0
dtype: int64

**No Null Values are present in the dataset.**

In [None]:
#Checking duplicate rows
df.duplicated().sum()

0

**No Duplicate Instances are present.**

In [None]:
# Identifing numerical and categorical columns
df.dtypes

Unnamed: 0           int64
timestamp           object
TP2                float64
TP3                float64
H1                 float64
DV_pressure        float64
Reservoirs         float64
Oil_temperature    float64
Motor_current      float64
COMP               float64
DV_eletric         float64
Towers             float64
MPG                float64
LPS                float64
Pressure_switch    float64
Oil_level          float64
Caudal_impulses    float64
dtype: object

In [None]:
# getting unique values for each column
num_col = df.iloc[:,2:17]
col = num_col.columns
for i in col:
  val = sorted(df[i].unique())
  cnt = len(val)
  # print(f'{i}: {val}')
  print(f'{i}: {cnt}')

TP2: 5257
TP3: 3683
H1: 2665
DV_pressure: 2257
Reservoirs: 3682
Oil_temperature: 2462
Motor_current: 1809
COMP: 2
DV_eletric: 2
Towers: 2
MPG: 2
LPS: 2
Pressure_switch: 2
Oil_level: 2
Caudal_impulses: 2


In [None]:
import pandas as pd
import statsmodels.api as sm

target_column = 'Motor_current'
y = df[target_column]

# Define the features for the model
feature_columns = ['TP2', 'TP3', 'H1', 'Reservoirs', 'Oil_temperature', 'DV_pressure', 'COMP',
                   'DV_eletric', 'Towers', 'MPG', 'LPS', 'Pressure_switch', 'Oil_level', 'Caudal_impulses']

# Create a DataFrame with only the selected features
X = df[feature_columns]

# Add a constant term to the predictor matrix (intercept)
X = sm.add_constant(X)

# Fit the OLS model
model = sm.OLS(y, X).fit()

# Print the summary of the model to see feature statistics and significance
print(model.summary())


                            OLS Regression Results                            
Dep. Variable:          Motor_current   R-squared:                       0.760
Model:                            OLS   Adj. R-squared:                  0.760
Method:                 Least Squares   F-statistic:                 3.440e+05
Date:                Sat, 26 Aug 2023   Prob (F-statistic):               0.00
Time:                        13:55:14   Log-Likelihood:            -2.3334e+06
No. Observations:             1516948   AIC:                         4.667e+06
Df Residuals:                 1516933   BIC:                         4.667e+06
Df Model:                          14                                         
Covariance Type:            nonrobust                                         
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
const             -14.2263      0.019   -7

In [1]:
! git --version

git version 2.34.1


In [2]:
!git config --global user.email "kumari.riya1925@gmail.com"

!git config --global user.name "RiyaKumari19"

In [6]:
!git clone https://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@github.com/kashish1203/minecrafters.git

Cloning into 'minecrafters'...
remote: Enumerating objects: 10, done.[K
remote: Counting objects: 100% (10/10), done.[K
remote: Compressing objects: 100% (8/8), done.[K
remote: Total 10 (delta 0), reused 0 (delta 0), pack-reused 0[K
Receiving objects: 100% (10/10), 2.80 MiB | 12.21 MiB/s, done.


In [7]:
!ls

minecrafters  sample_data


In [8]:
%cd minecrafters

/content/minecrafters


In [9]:
!git status

On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean


In [11]:
!git branch -a

* [32mmain[m
  [31mremotes/origin/Chinmaya[m
  [31mremotes/origin/HEAD[m -> origin/main
  [31mremotes/origin/Riya[m
  [31mremotes/origin/kashish[m
  [31mremotes/origin/main[m


In [12]:
!git checkout Riya

Branch 'Riya' set up to track remote branch 'Riya' from 'origin'.
Switched to a new branch 'Riya'


In [13]:
!git add .

!git commit -m "commit trial"

!git push

On branch Riya
Your branch is up to date with 'origin/Riya'.

nothing to commit, working tree clean
Everything up-to-date


In [14]:
!git log

[33mcommit ed51d89148059433d4be34804b4ff215c77e53af[m[33m ([m[1;36mHEAD -> [m[1;32mRiya[m[33m, [m[1;31morigin/main[m[33m, [m[1;31morigin/Riya[m[33m, [m[1;31morigin/HEAD[m[33m, [m[1;31morigin/Chinmaya[m[33m, [m[1;32mmain[m[33m)[m
Author: Chinmaya Pandey <137144018+Chinmaya54@users.noreply.github.com>
Date:   Sun Aug 27 18:44:06 2023 +0530

    Created using Colaboratory

[33mcommit a85acd3086a93c7ea0fee3fc4248f0bd9ef903d9[m
Author: kashish <75682006+kashish1203@users.noreply.github.com>
Date:   Sun Aug 27 18:37:15 2023 +0530

    Initial commit


In [15]:
#this is our new git tutorial !!!

In [16]:
!git push

Everything up-to-date
