    Department Of Computer Science
    COMP4381, SP.TOP: DATA SCIENCE AND ANALYTICS
    Dr. Hussein Soboh
    COMP4381 | Section 1 

## <div align=center> Assignment #7 </div>
<div align=center><b> 3D prints roughness dataset</b></div>
<div align=center>Linear Regression pipeline for the Roughness of the 3D prints</div>

    Prepeared by: Sondos Aabed
    ID: 1190652

## Table of Contents

- Introduction
- Tools and Versions
- Data Analysis Process
    - Data Wrangling
    - Data preparation for modeling
        - Feature scaling.
        - Feature selection.
        - Data splitting.
- Data Modeling Processing
    - Algorithm
    - Training 
    - Testing
    - Evaluation
        - performance metrics
        - bias, variance tradeoff
- Insights and Conclusions
    
<hr>

## Introduction
Working on 3D printed pieces, could face the challenge of having to reinforce them totally or locally in order to improve their strength and durability. It could be a whole part or a specific area subjected to some kind of load, such as compression, tension, shear, torsion, or bending. [1]
The aim of the noteboook is to determine how much of the adjustment parameters in 3d printers affect the print quality, accuracy and strenght it's more of a product quality task. Where there are nine setting parameters and three measured output parameters one of which that is the targeted (Roughness)

![5l7W9Cj1eGhVgFhuIfNKzirVA2v861pZ4xIW84T4qOw](https://github.com/sondosaabed/SP.TOP-Data-Science-and-Analytics/assets/65151701/cbf8ec7f-7490-4e67-8cfb-b37ac1cf4799)

**Figure 1:** Zurikh Artificail parts [4]

In this assignment, a dataset of 3D prints roughness and other features is used. The roughness is a measure of how rough the 3D printed part is. It is the target feature for this assignment, where the roughness a numerical value that will be predicted using linear regression.

### About the dataset

This dataset comes from research by TR/Selcuk University Mechanical Engineering department.[3]

Here is the [Kaggle Link of the Dataset](https://www.kaggle.com/datasets/afumetto/3dprinter/data?select=data.csv)

The dataset contains the following features:

|Feature|Type|Description|
|-----|-----|-----|
|Layer Height (mm)| numerical| 
| Wall Thickness (mm)| numerical|
| Infill Density (%)| numerical|Percentage of the object's interior filled with material.|
| Infill Pattern ()|ordinal| The geometric pattern used to fill the interior of the object.|
| Nozzle Temperature (Cº)|numerical| Temperature of the material exiting the printer nozzle.|
| Bed Temperature (Cº)|numerical| Temperature of the printer bed where the object is laid down.|
| Print Speed (mm/s)|numerical| Speed at which the printer nozzle travels while printing.|
| Material () | nominal | The filament or material used for printing the object.|
| Fan Speed (%)|numerical|

The target feature is: Roughness (µm)

The following figure shows what is means to have diffrent types of infill patterns and diffrent types of infill densities.
![main-qimg-70b737714f100e1b57c6c22d5d60effb](https://github.com/sondosaabed/SP.TOP-Data-Science-and-Analytics/assets/65151701/0c15e3ba-431e-40c0-b0f2-21d0401ad8fe)

**Figure 2:** fill patterns in 3D printing [2]

![image](https://github.com/sondosaabed/SP.TOP-Data-Science-and-Analytics/assets/65151701/73270f3c-15bd-464e-80f9-37904260d7f3)

**Figure 3:** Material PLA vs ABS [5]

<hr>

## Tools and Versions

The following tools and versions are utiliize through this reporting:

|Tool | Version |
|-----|---------|
|Python|3.12.2|
|Numpy|1.26.4|
|Matplotlib|3.8.2|
|Pandas|2.2.1|
|Sckitlearn||
|Visual Studio Code |Updated|
|Git & github|[Repo.](https://github.com/sondosaabed/SP.TOP-Data-Science-and-Analytics/blob/main/Assignments/A7-3D-prints-Roughness/1190652_A7.ipynb)|

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

<hr>

## Data Analysis Process

### Data Wrangling

In this section, the data wrangling process is followed wehre first the dataset is loaded and it goes through assessment and cleansing. Inluding aspectes of structural probelms and outliers, duplicaes or missing values.

#### Loading the Dataset

In [None]:
def load_data(path="./3d-printing-roughness.csv"):
    """
    Loads the csv data into the pandas data frame
    Args:
        path (string): path to the data, deafult value is the file name
    Returns:
        (pd.DataFrame): data frame contains the file data (pd.DataFrame)
    """
    df = None
    if path.endswith(".csv"):
        df = pd.read_csv(path)
    return df

In [None]:
df = load_data()
df.head()

<hr>

#### Assessing and Cleaning the datasets
In this section the following steps will be conducted:
- Assess and handle Columns and Data types
- Assess and handle Duplicates
- Assess and handle Missing Values
-  Assess and handle Outliers

##### Assessing and handling Columns and Data types

- Since one the requiremnts is to have an ordinal feature this feature has the notion of order on it regarding the structure and the infill pattern used where the higher the rank the more complex the pattern used in the design.

In [None]:
df.info()

In [None]:
df.nunique()

 for the infill_pattern categorical feature because it have the notion of order the higher means the more complex the structure is we can replace the grid = 0 and honeycomb = 1 instead of one hot encoding.

In [None]:
df.infill_pattern = [0 if each == "grid" else 1 for each in df.infill_pattern] 

for the second categorical feature which is the material it doesn't have the notion of order so it had to go through one hot encoding:

In [None]:
df = pd.get_dummies(df, columns=['material'], dtype=int)

In [None]:
df.sample(5)

> Now all the dataset is numerical values.

##### Assess and handle Duplicates
Now let's check for duplicates and handle them

In [None]:
df.duplicated().any()

> There are no duplicates records found.

##### Assess and handle Missing Values
This is the final section of cleaning the dataset, it is about detecting and handling the missing values.

In [None]:
df.isna().sum().sort_values()

> There are no missing records found.

##### Assess and handle Outliers
Now let's check for outliers with visualization using boxplot.

In [None]:
df.plot(kind='box',figsize=(15, 6));
plt.xlabel('Columns')  
plt.ylabel('Values') 
plt.grid(True, alpha=0.2)
plt.minorticks_on()
plt.suptitle('Figure 4: Boxlotting the 3D prints Roughness Dataset', size=20)
plt.tick_params(axis='x', rotation=70) 
plt.show()

> The print_speed has an upper bound outlier.

In [None]:
df['print_speed'].plot(kind='box',figsize=(4, 4));
plt.xlabel('Column')  
plt.ylabel('Values') 
plt.grid(True, alpha=0.2)
plt.minorticks_on()
plt.suptitle('Figure 5: Boxlotting the 3D prints Roughness Dataset', size=20)
plt.tick_params(axis='x') 
plt.show()

> The outlier is the record that has the speed of printing as 120, let's take a further look into the records that has that speed.

In [None]:
df[df['print_speed']== 120]

> There seem to be many records that have the printing speed of 120, it is decided to keep these records as a reasonable speed of printing.

## Data preparation for modeling
In this section the following steps will be conducted:
- Spliting into testing and training subsets.
- Feature scaling.
- Feature Selection and correlation.

### Feature Selection and correlation
Since numerical outliers were detected, the feature scaling will be performed using the standard scaler. 

In [None]:
scatter_matrix = pd.plotting.scatter_matrix(df,  figsize=(12, 12), diagonal='kde')
plt.suptitle('Figure 6: Scatter Matrix of 3D prints DataFrame', size=20)
for ax in scatter_matrix.ravel():
    ax.set_xlabel(ax.get_xlabel(), rotation=45, ha='right')
    ax.set_ylabel(ax.get_ylabel(), rotation=45, ha='right')

> It is noticed that 

In [None]:
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(numeric_only=True), annot=True, fmt=".2f", cmap='coolwarm', vmin=-1, vmax=1, cbar=True, linewidths=0.5, square=True)      
plt.title('Figure 7: Correlation Matrix of DataFrame', size=20)
plt.xticks(rotation=45, ha='right')  
plt.tight_layout()
plt.show()

> It is noticed that

In [None]:
selected_features = ['']

### Dataset Splitting
The splitting rule used is the 80:20 split train:test.

In [None]:
X = df[[selected_features]]
y = df['roughness']
display(X.head())
display(y.head())

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1190652)

### Feature scaling
Since numerical outliers were detected, the feature scaling will be performed using the standard scaler. 

In [None]:
X_train.describe().T

> Looking at the maximum and minimum values or the range of each of the numerical feature, the data contains diffrent scales of features so the decision is to make the step of feature scaling.

In [None]:
scaler = StandardScaler()
X_train_scaled = pd.DataFrame(scaler.fit_transform(X_train))
X_test_scaled = pd.DataFrame(scaler.transform(X_test))
display(X_train_scaled.sample(5))
display(X_train_scaled.describe().T)

> Now looking at the ranges and the minimum and maximum values they are all in the same scale of values.

## Data Modeling Process
in this section, two linear regression models are trained. Once with all the features and the second one with only the selected features based on the correlation between the features and the target. 

### Linear regression models

In [None]:
LRM1 = LinearRegression() ## All Features
LRM2 = LinearRegression() ## Selected Features

### Training

In [None]:
history1 = LRM1.fit(X_train_scaled,y_train)
history2 = LRM2.fit(X_train_scaled,y_train)

### Testing

In [None]:
y_pred_1=LRM1.predict(X_test)
y_pred_2=LRM2.predict(X_test)

### Testing Vs. Training performance

In [None]:
##

### Evaluation

#### Performance Metrics

#### Bias, Varianve Tradeoff

## Refrences
- [1] https://the3dbros.com/3d-print-infill-patterns-explained/
- [2] https://3dsolved.com/how-to-make-stronger-3d-prints-step-by-step-guide/
- [3] https://www.kaggle.com/datasets/afumetto/3dprinter/data?select=data.csv
- [4] https://www.weforum.org/agenda/2023/11/robotics-3d-printing-smartphones-space-technology-november/
- [5] https://3d2go.com.ph/blog/abs-vs-pla-filaments/
- [6] https://medium.com/@ahmet17/makina-m%C3%BChendisleri-i%C3%A7in-derin-%C3%B6%C4%9Frenme-3d-printer-veri-setinin-i%CC%87ncelenmesi-6fe1f48e0cdb