# Linear Regression fitting some Neighborhood Degree-based Topological Indices
### Some Neighborhood Degree-based Topological Indices of Hyaluronic Acid Methotrexate Conjugates 
<p align="center">
Vignesh Ravi(a), Natarajan Chidambaram(b), R.U. Gobithaasan(c,∗), Narasimhan Devadoss(b), Kenjiro T. Miura(d)

- (a) Division of Mathematics, Vellore Institute of Technology, Chennai, Tamil Nadu, India.
- (b) Department of Mathematics, Srinivasa Ramanujan Centre, SASTRA Deemed to be University, Kumbakonam, 612 001, Tamil Nadu, India
- (c) School of Mathematical Sciences, Universiti Sains Malaysia, 11800 USM, Penang, Malaysia. [Official Website](https://math.usm.my/academic-profile/705-gobithaasan-rudrusamy)
- (d) Graduate School of Engineering, Shizuoka University, 3-5-1 Johoku, Naka-ku, Hamamatsu, 432-8011, Japan
</p>
 



---

Linear regression is a fundamental statistical method used to model the relationship between a dependent variable (target) and one or more independent variables (predictors) by fitting a linear equation to observed data**. 

- It assumes that there is a linear relationship between the variables and aims to find the best-fitting straight line that describes the relationship.

The equation of a simple linear regression with one independent variable can be represented as:

$ y = \beta_0 + \beta_0 x $

Where:
- $ y $ is the dependent variable representing the neighborhood degree-based topological index values.
- $ x $ is the independent variable or known as the predictor represents number of conjugates.
- $ \beta_0 $ is the y-intercept (the value of $ y $ when $ x $ is 0).
- $ \beta_1 $ is the slope of the line (the effect of $ x $ on $ y $).

We can carry regression analysis with two types of packages in Python: `scipy.stats`.




### Regression lines for open graphs

In [1]:
import pandas as pd
ham_open = pd.read_csv("data/HAM VALUES OPEN.csv")
ham_open

Unnamed: 0,n,M1N,M2N,HMN,FN,ND1,ND2,ND3,ND4,ND5,SKN,SK1N,SK2N,mRN,NI
0,1,718,2076,8508,4356,353.9543,18.975992,25498,11.797804,134.7583,359,1038.0,2727.0,10.2107,174.567
1,2,1454,4261,17434,8912,717.409047,37.700705,52934,23.249498,268.5023,727,2130.5,5558.5,20.2214,354.1193
2,3,2190,6446,26360,13468,1080.863794,56.425418,80370,34.701192,402.2463,1095,3223.0,8390.0,30.2321,533.6716
3,4,2926,8631,35286,18024,1444.318541,75.150131,107806,46.152886,535.9903,1463,4315.5,11221.5,40.2428,713.2239
4,5,3662,10816,44212,22580,1807.773288,93.874844,135242,57.60458,669.7343,1831,5408.0,14053.0,50.2535,892.7762
5,6,4398,13001,53138,27136,2171.228035,112.599557,162678,69.056274,803.4783,2199,6500.5,16884.5,60.2642,1072.3285
6,7,5134,15186,62064,31692,2534.682782,131.32427,190114,80.507968,937.2223,2567,7593.0,19716.0,70.2749,1251.8808
7,8,5870,17371,70990,36248,2898.137529,150.048983,217550,91.959662,1070.9663,2935,8685.5,22547.5,80.2856,1431.4331
8,9,6606,19556,79916,40804,3261.592276,168.773696,244986,103.411356,1204.7103,3303,9778.0,25379.0,90.2963,1610.9854
9,10,7342,21741,88842,45360,3625.047023,187.498409,272422,114.86305,1338.4543,3671,10870.5,28210.5,100.307,1790.5377


In [2]:
from scipy.stats import linregress

dict_list = []

x = ham_open['n'].to_numpy()
# dropping number of conjugates
ham_open = ham_open.drop(['n'], axis=1)

# Iterate over each types of indices
for column in ham_open:
 
	columnSeriesObj = ham_open[column].to_numpy()
	# Calculate linear regression statistics
	slope, intercept, r_value, p_value, std_err = linregress(x.flatten(),columnSeriesObj.flatten())
	# Calculate R-squared
	r_squared = r_value**2

	# # Print regression statistics
	# print('----Column Name : ', column)
	# print(f"Slope: {slope}")
	# print(f"Intercept: {intercept}")
	# print(f"R-squared: {r_squared}")
	# print(f"P-value: {p_value}")
	# print(f"Standard Error: {std_err}")
 
	row_dict = {'Graph':column,'Slope':slope, 'Intercept':intercept, 'R-squared':r_value, 'P-value':p_value,'Standard Error':std_err}
	dict_list.append(row_dict)


In [3]:
df = pd.DataFrame.from_dict(dict_list)
df

Unnamed: 0,Graph,Slope,Intercept,R-squared,P-value,Standard Error
0,M1N,736.0,-18.0,1.0,4.3749999999999996e-80,0.0
1,M2N,2185.0,-109.0,1.0,4.3749999999999996e-80,0.0
2,HMN,8926.0,-418.0,1.0,4.3749999999999996e-80,0.0
3,FN,4556.0,-200.0,1.0,4.3749999999999996e-80,0.0
4,ND1,363.454747,-9.500447,1.0,4.3749999999999996e-80,0.0
5,ND2,18.724713,0.251279,1.0,1.063695e-62,1.3951e-07
6,ND3,27436.0,-1938.0,1.0,4.3749999999999996e-80,0.0
7,ND4,11.451694,0.34611,1.0,4.3749999999999996e-80,0.0
8,ND5,133.744,1.0143,1.0,4.3749999999999996e-80,0.0
9,SKN,368.0,-9.0,1.0,4.3749999999999996e-80,0.0


In [4]:
df = df.drop(['R-squared','P-value','Standard Error'], axis=1)
print(df.to_latex(index=False,formatters={"name": str.upper},float_format="{:.3f}".format,))

\begin{tabular}{lrr}
\toprule
Graph & Slope & Intercept \\
\midrule
M1N & 736.000 & -18.000 \\
M2N & 2185.000 & -109.000 \\
HMN & 8926.000 & -418.000 \\
FN & 4556.000 & -200.000 \\
ND1 & 363.455 & -9.500 \\
ND2 & 18.725 & 0.251 \\
ND3 & 27436.000 & -1938.000 \\
ND4 & 11.452 & 0.346 \\
ND5 & 133.744 & 1.014 \\
SKN & 368.000 & -9.000 \\
SK1N & 1092.500 & -54.500 \\
SK2N & 2831.500 & -104.500 \\
mRN & 10.011 & 0.200 \\
NI & 179.552 & -4.985 \\
\bottomrule
\end{tabular}



### Regression lines for closed graphs

In [5]:
import pandas as pd
ham_closed = pd.read_csv("data/HAM VALUES CLOSED.csv")
ham_closed

Unnamed: 0,n,M1NC,M2NC,HMNC,FNC,NCD1,NCD2,NCD3,NCD4,NCD5,SKNC,SK1NC,SK2NC,mRNC,ISNC
0,1,1024,4175,17194,8844,503.042335,15.839647,72440,8.251583,137.4567,512,2087.5,4298.5,6.9845,247.2942
1,2,2070,8534,35098,18030,1017.634997,31.494529,149416,16.290186,274.082,1035,4267.0,8774.5,13.8413,500.6228
2,3,3116,12893,53002,27216,1532.227659,47.149411,226392,24.328789,410.7073,1558,6446.5,13250.5,20.6981,753.9514
3,4,4162,17252,70906,36402,2046.820321,62.804293,303368,32.367392,547.3326,2081,8626.0,17726.5,27.5549,1007.28
4,5,5208,21611,88810,45588,2561.412983,78.459175,380344,40.405995,683.9579,2604,10805.5,22202.5,34.4117,1260.6086
5,6,6254,25970,106714,54774,3076.005645,94.114057,457320,48.444598,820.5832,3127,12985.0,26678.5,41.2685,1513.9372
6,7,7300,30329,124618,63960,3590.598307,109.768939,534296,56.483201,957.2085,3650,15164.5,31154.5,48.1253,1767.2658
7,8,8346,34688,142522,73146,4105.190969,125.423821,611272,64.521804,1093.8338,4173,17344.0,35630.5,54.9821,2020.5944
8,9,9392,39047,160426,82332,4619.783631,141.078703,688248,72.560407,1230.4591,4696,19523.5,40106.5,61.8389,2273.923
9,10,10438,43406,178330,91518,5134.376293,156.733585,765224,80.59901,1367.0844,5219,21703.0,44582.5,68.6957,2527.2516


In [6]:
from scipy.stats import linregress

dict_list = []

x = ham_closed['n'].to_numpy()
# dropping number of conjugates
ham_open = ham_closed.drop(['n'], axis=1)


# Iterate over column names
for column in ham_open:
 
	columnSeriesObj = ham_closed[column].to_numpy()
	# Calculate linear regression statistics
	slope, intercept, r_value, p_value, std_err = linregress(x.flatten(),columnSeriesObj.flatten())
	# Calculate R-squared
	r_squared = r_value**2

	# # Print regression statistics
	# print('----Column Name : ', column)
	# print(f"Slope: {slope}")
	# print(f"Intercept: {intercept}")
	# print(f"R-squared: {r_squared}")
	# print(f"P-value: {p_value}")
	# print(f"Standard Error: {std_err}")
 
	row_dict = {'Graph':column,'Slope':slope, 'Intercept':intercept, 'R-squared':r_value, 'P-value':p_value,'Standard Error':std_err}
	dict_list.append(row_dict)


In [7]:
df = pd.DataFrame.from_dict(dict_list)
df

Unnamed: 0,Graph,Slope,Intercept,R-squared,P-value,Standard Error
0,M1NC,1046.0,-22.0,1.0,4.3749999999999996e-80,0.0
1,M2NC,4359.0,-184.0,1.0,4.3749999999999996e-80,0.0
2,HMNC,17904.0,-710.0,1.0,4.3749999999999996e-80,0.0
3,FNC,9186.0,-342.0,1.0,4.3749999999999996e-80,0.0
4,NCD1,514.592662,-11.550327,1.0,4.3749999999999996e-80,0.0
5,NCD2,15.654882,0.184765,1.0,1.063695e-62,1.16638e-07
6,NCD3,76976.0,-4536.0,1.0,4.3749999999999996e-80,0.0
7,NCD4,8.038603,0.21298,1.0,4.3749999999999996e-80,0.0
8,NCD5,136.6253,0.8314,1.0,4.3749999999999996e-80,0.0
9,SKNC,523.0,-11.0,1.0,4.3749999999999996e-80,0.0


In [8]:
df = df.drop(['R-squared','P-value','Standard Error'], axis=1)
print(df.to_latex(index=False,formatters={"name": str.upper},float_format="{:.3f}".format,))

\begin{tabular}{lrr}
\toprule
Graph & Slope & Intercept \\
\midrule
M1NC & 1046.000 & -22.000 \\
M2NC & 4359.000 & -184.000 \\
HMNC & 17904.000 & -710.000 \\
FNC & 9186.000 & -342.000 \\
NCD1 & 514.593 & -11.550 \\
NCD2 & 15.655 & 0.185 \\
NCD3 & 76976.000 & -4536.000 \\
NCD4 & 8.039 & 0.213 \\
NCD5 & 136.625 & 0.831 \\
SKNC & 523.000 & -11.000 \\
SK1NC & 2179.500 & -92.000 \\
SK2NC & 4476.000 & -177.500 \\
mRNC & 6.857 & 0.128 \\
ISNC & 253.329 & -6.034 \\
\bottomrule
\end{tabular}

