
# Linear Regression in Python 3.x
## Multiple Linear Regression for ARC Welding Data
### Anirudh Jonnalagadda, PhD
##### Shell Postdoctoral Fellow @ CDS, IISc
###### (anirudhj@iisc.ac.in)

Data taken from [Pal _et. al_ (2008) Journal of materials processing technology](https://doi.org/10.1016/j.jmatprotec.2007.09.039)

### Linear Regression Using Scikit-learn

In [None]:
# # for google colab
# !git clone https://github.com/jAnirudh/SVNIT.git

In [None]:
# # for google colab
# import os
# os.chdir('SVNIT') # change directory

In [None]:
import pandas
dataframe = pandas.read_csv('arc_welding.csv')

In [None]:
# Lets see the contents of the dataframe
dataframe.head() # top 5 rows

In [None]:
dataframe.tail() # last 5 rows

In [None]:
# how many rows and columns do we have?
print('nrows = {:}; ncolumns = {:}'.format(len(dataframe), dataframe.columns.size))

In [None]:
# if you have larger number of columns?
dataframe.columns

In [None]:
# let's drop the "Experiment no." column
df = dataframe.drop('Experiment no.', axis = 1) # axis = 1 for a column, 0 for a row

In [None]:
df.head()

In [None]:
# say you want to see the values of a particular column
df['Background voltage (VB)']

In [None]:
# This is because you generally do not know how the columns are named
df[df.columns[0]]

In [None]:
# It is therefore generally easier to just rename the columns more to be legible
df.columns = ['background_voltage', 'pulse_voltage', 'pulse_frequency', 'pulse_duty_factor', 
              'wire_feed_rate', 'table_feed_rate', 'rms_current', 'rms_voltage', 'uts']

In [None]:
df.head()

### Let's do regression

In [None]:
# isolate the dependant variables
X = df[df.columns[:-1]]
X.head()

In [None]:
# Isolate the inpendant variable
Y = df[df.columns[-1]]

In [None]:
# Create the regression object
from sklearn import linear_model
model = linear_model.LinearRegression()

In [None]:
# fit
model.fit(X, Y)

In [None]:
# get the regression coefficients
model.coef_

In [None]:
# get the regression intercept
model.intercept_