A Library for handling general tasks regarding statistcis and ml with in depth detailing
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
What things you need to install the software and how to install them
Give examples
Packages: Math
A step by step series of examples that tell you have to get a development env running
To setup the project to your local machine simply type the command in git bash
git clone https://github.com/ShaonMajumder/rml.git
And repeat
until finished
End with an example of getting some data out of the system or using it for a little demo
- nroot(n,number) - return the nth root of a number
- sum_all(*args) - return sum of all numbers seperated by comma
- auto_include(starts_with='RML',extension='py') - automatically include all py files starts with'RML'
- print_dic(**kwargs) - print key="value" pairs
- binary_search(itearble,target) - return the position of the target
- quicksort(iterable) - return sorted iterable
- maxn(iterable) - returns the max number of an iterable
- minn(iterable) - returns the min number of an iterable
- conv_type(iterable,"type") - converting iterable items to a certain data type
- sum(iterable) - returns sum of an iterable which contains number
- appends(iterable,object,index or empty) - appends an object on behind or specific position of an iterable , then returns
- swap(iterable, position1, position2) - swaps items according to their index and returns the new iterable
- length(iterable) - returns length of an iterable or string
- getIndex(iterable,val) - returns Index of an item inside an iterable
- getIndexes(iterable) - returns Indexes of iterable
- strip(string,strip_chars) - remove white spaces,newlines or any special chars contained in strip_chars iterable passed from string and returns
- split(string,spliting_char = ',') - create list by dividing string with the spliting char and returns
- columns(df) - returns columns of a dataframe object
- read_csv(input_file_url) - returns dataframe object from text file conversion
- df_size(df) - returns the dataframe dimension as row , column
- create_empty_dataframe(m,n) - returns an empty list with the given dimension
- transpose(df) - returns transpose matrice orrientation of data list or integer indexed dictionary
- matrice(df) - assign matrice by matrice representation and convert to machine readable list
- list_multiplication(dm1,dm2) - multiply two list of same size
- mat_dot(dm1,dm2) - matrice dot multiplication between two dic or list Machine Readable Dataframe list representation [column1=[c1row1,c1row2,c1row3] column2=[c2row1,c2row2,c2row3]]
- classobject = DataFrame(dataframe = [[2,3,4],[2,5,2]]) - assign dataframe by Machine Readable Dataframe list representation
- classobject = DataFrame().read_csv(file_link) - assign dataframe by reading text file of csv formate
- classobject - get dataframe representation
- classobject[colindex] - access a column with index
- classobject[colindex][rowindex] - access a cell with 2d index
- classobject['columnname'] - access a column with index_name or index_string
- classobject.tolist - get list representation
- classobject.shape - Shape of dataframe
- classobject.columns - Get column names
- classobject.columns=['low','up','freq'] - renaming columns
- classobject.create_dataframe(m,n,elm=None) - element with None , DataFrame of m x n
- classobject.T - Transposed and returned
- classobject.transpose(change_self = True) - Transposed itself and also returned
- classobject.conv_type('int',change_self=True) - change datatype of dataframe
- classobject.normalize() - normalizes the value of dataframe
- classobject.concatenate(classobject1,classobject1,axis=0) - add two dataframe in x or y axist
- classobject.sum(axis=0) - adding matrice to row/columns opposite to val
- classobject.substract(dm1,dm2) - return by subtracting two matrice
- classobject.power(n) - raise the dataframe to the power of n then return
- classobject.mat_dot(dm1,dm2) - matrice dot multiplications for dataframe
- classobject.cross(dm1,dm2) - multiply dataframes by broadcasting method
- classobject.row(rowindex) - return a row by index
- ArithmeticMean(iterable) - for single column ArithmeticMean(lower,upper,frequency) - for class distribution
- GeometricMean(iterable) #for single column GeometricMean(lower,upper,frequency) #for class distribution
- HarmonicMean(iterable) #for single column HarmonicMean(lower,upper,frequency) #for class distribution
- mode(iterable) #for single column mode(lower,upper,frequency) #for class distribution
- median(iterable) #for single column median(lower,upper,frequency) #for class distribution
- sample_standard_deviation(iterator) #for single column
- sample_variance(li)
- co_standard_deviation(li,pi)
- co_variance(li,pi)
- pearson_correlation_coefficient(li,pi)
- normalize(df) - scaling data for creating distributed dominance
- reference_reverse_normalize(ypure,y_pred) - reverse scaling data by actual and normalized reference
-
MLVR(XDATA, YDATA, xreference=0, residual=1, xlabel='', ylabel='', title='', alpha=0.01, iters=1000, plot=1) - Does Multivariant Linear Regression properties: XDATA = The Feature Dataframe YDATA = The Target Dataframe xreference = 1/0 -> The column index in XDATA for ploting graph xlabel = Label for X in Graph ylabel = Label for Y in Graph title = title for graph] alpha = Learning rate for model iters = the number of iteration to train the model
-
compute_distance(p1, p2) - returns distance between two point
-
estimate_coef(x, y) - Estimate co-efficients m,c for straight line mx+c
-
get_max_area()
-
get_max_rectangele()
-
give_time_series(x, y) - Rearrange X,Y value pairs or points according to X's order
-
linear_regression(x, y, title='', xlabel='X', ylabel='Y') - Does simple linear regression
-
maxResidual(pure, pred) - returns maximum error distance or residual
-
max_min_rectangle(x, y) - Plot a rectangle using max and min point from a distribution
-
meanResidual(pure, pred) - returns average error distance or residual
-
minResidual(pure, pred) - returns minimum error distance or residual
-
plot_eachpoint_connected(x, y) - Plot connecting every point with each other from a distribution
-
plot_error_distance(x, y_pred, y_actual) - Plot error distance or residual
-
plot_regression_line(x, y, b, title='', xlabel='X', ylabel='Y') - ploting the prediction line using simple linear regression
1.Statistical 2. DataFrame 3. Machine Learning
Input Formats for DataFrame Single Column
1921
1925
1945
1947
Multiple Column
serial, name,fair
11,20,34
21,30,4
31,40,33
41,50,7
- Always conv the list or dataframe object by conv_type(var,type) to number more specificly float to avoid any error in using Statistical Functions.
Math functions
print(sum_all(2,3,4))
print(nroot(2,4))
Utility functions
print_dic(name="Shaon",Position="Data Scientist")
DataFrame Class all properties
dm1 = DataFrame().read_csv('sample_inputs/input1.txt')
print('List presentation',dm1.tolist)
print(dm1)
print(dm1[1])
print("accessing cell")
print(dm1[1][0])
print(dm1['lower'])
print('Columns',dm1.columns)
print('Shape',dm1.shape)
print('renaming columns')
dm1.columns=['low','up','freq']
print(dm1)
print('empty DataFrame of 2x3',dm1.create_dataframe(2,3))
print('element with 1 , DataFrame of 2x3',dm1.create_dataframe(2,3,elm=1))
print('Transposed and returned',dm1.T)
print(dm1)
print('Transposed itself and also returned',dm1.transpose(change_self = True))
print(dm1)
Class DataFrame datatype conversion
dm1 = DataFrame().read_csv('sample_inputs/matrice.txt')
print('List presentation')
print(dm1.tolist)
print('returned result of conversion')
dm2 = dm1.conv_type('int')
print(dm2)
print('print actual dataframe')
print(dm1)
print('conversion,returned and changing itself')
dm1.conv_type('int',change_self=True)
print(dm1)
Add two dataframe by rows
my_data = DataFrame().read_csv('sample_inputs/home.txt',columns=["size","bedroom","price"])
my_data.conv_type('float',change_self=True)
my_data.normalize(change_self=True)
X = DataFrame(dataframe = my_data[0:2])
ones = DataFrame().create_dataframe(X.framesize[0],1,elm=1.)
X = DataFrame(dataframe = X.T)
ones = DataFrame(dataframe = ones.T)
print(ones.framesize,X.framesize)
X = DataFrame(dataframe=DataFrame().concatenate(ones,X,axis=0))
print(X.tolist)
Add two dataframe by columns
my_data = DataFrame().read_csv('sample_inputs/home.txt',columns=["size","bedroom","price"])
my_data.conv_type('float',change_self=True)
my_data.normalize(change_self=True)
X = DataFrame(dataframe= my_data[0:2])
ones = DataFrame().create_dataframe(X.framesize[0],1,elm=1.)
X = DataFrame(dataframe=DataFrame().concatenate(ones,X,axis=1))
print(X)
Subtract two dataframe by same size
my_data = DataFrame().read_csv('sample_inputs/home.txt',columns=["size","bedroom","price"])
my_data.conv_type('float',change_self=True)
X = DataFrame(dataframe= my_data[0:2])
Y = DataFrame(dataframe= my_data[1:3])
print(DataFrame().substract(X, Y))
n power of dataframe
my_data = DataFrame().read_csv('sample_inputs/home.txt',columns=["size","bedroom","price"])
my_data.conv_type('float',change_self=True)
my_data.power(2)
print(my_data)
Summation to rows or column of dataframe
#adding rows
liq = DataFrame().mat_dot( DataFrame(dataframe=[[1,4],[2,5],[3,6]]), DataFrame(dataframe=[[7,9,11],[8,10,12]]) )
print(DataFrame(dataframe = liq.tolist ))
print(DataFrame(dataframe = liq.tolist ).sum(axis=1))
#adding columns
liq = DataFrame().mat_dot( DataFrame(dataframe=[[1,4],[2,5],[3,6]]), DataFrame(dataframe=[[7,9,11],[8,10,12]]) )
print(DataFrame(dataframe = liq.tolist ))
print(DataFrame(dataframe = liq.tolist ).sum(axis=0))
Statistics on Single Column Data
from RML_DataFrame import *
from RML_Stat import *
df3 = DataFrame().read_csv('sample_inputs/input2.txt')
print(df3)
print("ArithmeticMean = %s" % ArithmeticMean(conv_type(df3[0],"int")) )
print("GeometricMean = %s" % GeometricMean(conv_type(df3[0],"int")))
print("HarmonicMean = %s" % HarmonicMean(df3[0]))
print("Mode = %s" % mode(df3[0]))
print("Median = %s" % median(df3[0]))
Statistics on Class Distribution or Grouped Data
from RML_DataFrame import *
from RML_Stat import *
df = DataFrame().read_csv('sample_inputs/input1.txt')
print(df)
lower, upper, frequency = conv_type(df['lower'],"int"),conv_type(df['upper'],"int"),conv_type(df['frequency'],"int")
print("ArithmeticMean = %s" % ArithmeticMean(lower=lower,upper=upper,frequency=frequency))
print("GeometricMean = %s" % GeometricMean(lower=lower,upper=upper,frequency=frequency))
print("HarmonicMean = %s" % HarmonicMean(lower=lower,upper=upper,frequency=frequency))
print("Mode = %s" % mode(lower=lower,upper=upper,frequency=frequency))
print("Median = %s" % median(lower=lower,upper=upper,frequency=frequency))
Simple Linear Regression from textfile
from RML_DataFrame import *
from RML_ML import *
dm = DataFrame()
dm.read_csv('sample_inputs/matrice2.txt')
linear_regression(conv_type(dm[0],'int'),conv_type(dm[1],'int'))
Simple Linear Regression from textfile by dataframe representation
from RML_DataFrame import *
from RML_ML import *
dm = DataFrame().read_csv('sample_inputs/matrice2.txt')
linear_regression(conv_type(dm[0],'int'),conv_type(dm[1],'int'))
Multivariant Linear Regression from textfile by dataframe representation
my_data = DataFrame().read_csv('sample_inputs/home.txt',columns=["size","bedroom","price"])
my_data.conv_type('float',change_self=True)
my_data.normalize(change_self=True)
XDATA = DataFrame(dataframe= my_data[0:2],columns=['size','bedroom'])
YDATA = DataFrame(dataframe= [my_data[2]])
multivariant_linear_regression(XDATA,YDATA,xreference=0,residual=0,xlabel='size',ylabel='price',title='Multivariant Linear regression')
Matrice dot multiplication from textfile
dm1 = read_csv('sample_inputs/matrice.txt')
dm2 = read_csv('sample_inputs/matrice3.txt')
res = mat_dot(dm1,dm2)
print(res)
Matrice dot multiplication from textfile by Dataframe representation
from RML_DataFrame import *
dm1 = DataFrame().read_csv('sample_inputs/matrice.txt')
dm2 = DataFrame().read_csv('sample_inputs/matrice3.txt')
print( DataFrame().mat_dot(dm1,dm2) )#creates another object
print( mat_dot(dm1.dataframe,dm2.dataframe) )#efficient
print( mat_dot(dm1.tolist,dm2.tolist) )#efficient
Matrice dot multiplication from matrice representation
from RML_DataFrame import *
dm1 = matrice( [[4,8],
[0,2],
[1,6]] )
dm2 = matrice( [[5,2],
[9,4]] )
res = mat_dot(dm1,dm2)
print(res)
Rectangle with Max, Min point
from RML_DataFrame import *
from RML_ML import *
dm1 = DataFrame().read_csv('sample_inputs/youtube.txt')
max_min_rectangle(dm1['length(s)'],dm1['views'])
git config --global user.name "shaon" git config --global user.email "smazoomder@gmail.com"
git remote add origin https://github.com/ShaonMajumder/rml.git
git add .
git commit -m "This a commit"
git push -u origin master
git pull origin master
No module Found due to root acces
sudo jupyter notebook --allow-root
Explain how to run the automated tests for this system
Explain what these tests test and why
Give an example
Explain what these tests test and why
Give an example
Library names must starts with RML , other file can not be start as RML
Add additional notes about how to deploy this on a live system
- Dropwizard - The web framework used
- Maven - Dependency Management
- ROME - Used to generate RSS Feeds
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
We use SemVer for versioning. For the versions available, see the tags on this repository.
- Shaon Majumder - Github
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.md file for details
- Sabbir Amin - For guideline for mentoring
- Inspiration
- etc