Robist Machine Library

A Library for handling general tasks regarding statistcis and ml with in depth detailing

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

What things you need to install the software and how to install them

Give examples

Requirements

Packages: Math

Installing

A step by step series of examples that tell you have to get a development env running

To setup the project to your local machine simply type the command in git bash git clone https://github.com/ShaonMajumder/rml.git

And repeat

until finished

End with an example of getting some data out of the system or using it for a little demo

Function Manual

For Math

nroot(n,number) - return the nth root of a number
sum_all(*args) - return sum of all numbers seperated by comma

For Utility

auto_include(starts_with='RML',extension='py') - automatically include all py files starts with'RML'
print_dic(**kwargs) - print key="value" pairs

For Sorting and Searching

binary_search(itearble,target) - return the position of the target
quicksort(iterable) - return sorted iterable

For Iterable

maxn(iterable) - returns the max number of an iterable
minn(iterable) - returns the min number of an iterable
conv_type(iterable,"type") - converting iterable items to a certain data type
sum(iterable) - returns sum of an iterable which contains number
appends(iterable,object,index or empty) - appends an object on behind or specific position of an iterable , then returns
swap(iterable, position1, position2) - swaps items according to their index and returns the new iterable
length(iterable) - returns length of an iterable or string
getIndex(iterable,val) - returns Index of an item inside an iterable
getIndexes(iterable) - returns Indexes of iterable

For String

strip(string,strip_chars) - remove white spaces,newlines or any special chars contained in strip_chars iterable passed from string and returns
split(string,spliting_char = ',') - create list by dividing string with the spliting char and returns

For DataFrame

Functions

columns(df) - returns columns of a dataframe object
read_csv(input_file_url) - returns dataframe object from text file conversion
df_size(df) - returns the dataframe dimension as row , column
create_empty_dataframe(m,n) - returns an empty list with the given dimension
transpose(df) - returns transpose matrice orrientation of data list or integer indexed dictionary
matrice(df) - assign matrice by matrice representation and convert to machine readable list
list_multiplication(dm1,dm2) - multiply two list of same size
mat_dot(dm1,dm2) - matrice dot multiplication between two dic or list Machine Readable Dataframe list representation [column1=[c1row1,c1row2,c1row3] column2=[c2row1,c2row2,c2row3]]

Class : (1)DataFrame

Creating Classobject

classobject = DataFrame(dataframe = [[2,3,4],[2,5,2]]) - assign dataframe by Machine Readable Dataframe list representation
classobject = DataFrame().read_csv(file_link) - assign dataframe by reading text file of csv formate

Class Properties

classobject - get dataframe representation
classobject[colindex] - access a column with index
classobject[colindex][rowindex] - access a cell with 2d index
classobject['columnname'] - access a column with index_name or index_string
classobject.tolist - get list representation
classobject.shape - Shape of dataframe
classobject.columns - Get column names
classobject.columns=['low','up','freq'] - renaming columns

Class methods

classobject.create_dataframe(m,n,elm=None) - element with None , DataFrame of m x n
classobject.T - Transposed and returned
classobject.transpose(change_self = True) - Transposed itself and also returned
classobject.conv_type('int',change_self=True) - change datatype of dataframe
classobject.normalize() - normalizes the value of dataframe
classobject.concatenate(classobject1,classobject1,axis=0) - add two dataframe in x or y axist
classobject.sum(axis=0) - adding matrice to row/columns opposite to val
classobject.substract(dm1,dm2) - return by subtracting two matrice
classobject.power(n) - raise the dataframe to the power of n then return
classobject.mat_dot(dm1,dm2) - matrice dot multiplications for dataframe
classobject.cross(dm1,dm2) - multiply dataframes by broadcasting method
classobject.row(rowindex) - return a row by index

For Statistics

ArithmeticMean(iterable) - for single column ArithmeticMean(lower,upper,frequency) - for class distribution
GeometricMean(iterable) #for single column GeometricMean(lower,upper,frequency) #for class distribution
HarmonicMean(iterable) #for single column HarmonicMean(lower,upper,frequency) #for class distribution
mode(iterable) #for single column mode(lower,upper,frequency) #for class distribution
median(iterable) #for single column median(lower,upper,frequency) #for class distribution
sample_standard_deviation(iterator) #for single column
sample_variance(li)
co_standard_deviation(li,pi)
co_variance(li,pi)
pearson_correlation_coefficient(li,pi)
normalize(df) - scaling data for creating distributed dominance
reference_reverse_normalize(ypure,y_pred) - reverse scaling data by actual and normalized reference

For Machine Learning

MLVR(XDATA, YDATA, xreference=0, residual=1, xlabel='', ylabel='', title='', alpha=0.01, iters=1000, plot=1) - Does Multivariant Linear Regression properties: XDATA = The Feature Dataframe YDATA = The Target Dataframe xreference = 1/0 -> The column index in XDATA for ploting graph xlabel = Label for X in Graph ylabel = Label for Y in Graph title = title for graph] alpha = Learning rate for model iters = the number of iteration to train the model
compute_distance(p1, p2) - returns distance between two point
estimate_coef(x, y) - Estimate co-efficients m,c for straight line mx+c
get_max_area()
get_max_rectangele()
give_time_series(x, y) - Rearrange X,Y value pairs or points according to X's order
linear_regression(x, y, title='', xlabel='X', ylabel='Y') - Does simple linear regression
maxResidual(pure, pred) - returns maximum error distance or residual
max_min_rectangle(x, y) - Plot a rectangle using max and min point from a distribution
meanResidual(pure, pred) - returns average error distance or residual
minResidual(pure, pred) - returns minimum error distance or residual
plot_eachpoint_connected(x, y) - Plot connecting every point with each other from a distribution
plot_error_distance(x, y_pred, y_actual) - Plot error distance or residual
plot_regression_line(x, y, b, title='', xlabel='X', ylabel='Y') - ploting the prediction line using simple linear regression

Functionality

1.Statistical 2. DataFrame 3. Machine Learning

Input and Output

Input Formats for DataFrame Single Column

Multiple Column

serial, name,fair
11,20,34
21,30,4
31,40,33
41,50,7

Cautions

Always conv the list or dataframe object by conv_type(var,type) to number more specificly float to avoid any error in using Statistical Functions.

Example

Math functions

	print(sum_all(2,3,4))
	print(nroot(2,4))

Utility functions

	print_dic(name="Shaon",Position="Data Scientist")

DataFrame Class all properties

	dm1 = DataFrame().read_csv('sample_inputs/input1.txt')
	print('List presentation',dm1.tolist)
	print(dm1)
	print(dm1[1])
	print("accessing cell")
	print(dm1[1][0])
	print(dm1['lower'])
	print('Columns',dm1.columns)
	print('Shape',dm1.shape)
	print('renaming columns')
	dm1.columns=['low','up','freq']
	print(dm1)
	print('empty DataFrame of 2x3',dm1.create_dataframe(2,3))
	print('element with 1 , DataFrame of 2x3',dm1.create_dataframe(2,3,elm=1))

	print('Transposed and returned',dm1.T)
	print(dm1)
	print('Transposed itself and also returned',dm1.transpose(change_self = True))
	print(dm1)

Class DataFrame datatype conversion

	dm1 = DataFrame().read_csv('sample_inputs/matrice.txt')
	print('List presentation')
	print(dm1.tolist)
	print('returned result of conversion')
	dm2 = dm1.conv_type('int')
	print(dm2)
	print('print actual dataframe')
	print(dm1)
	print('conversion,returned and changing itself')
	dm1.conv_type('int',change_self=True)
	print(dm1)

Add two dataframe by rows

	my_data = DataFrame().read_csv('sample_inputs/home.txt',columns=["size","bedroom","price"])
	my_data.conv_type('float',change_self=True)
	my_data.normalize(change_self=True)
	X = DataFrame(dataframe = my_data[0:2])
	ones = DataFrame().create_dataframe(X.framesize[0],1,elm=1.)
	X = DataFrame(dataframe = X.T)
	ones = DataFrame(dataframe = ones.T)
	print(ones.framesize,X.framesize)
	X = DataFrame(dataframe=DataFrame().concatenate(ones,X,axis=0))
	print(X.tolist)

Add two dataframe by columns

	my_data = DataFrame().read_csv('sample_inputs/home.txt',columns=["size","bedroom","price"])
	my_data.conv_type('float',change_self=True)
	my_data.normalize(change_self=True)
	
	X = DataFrame(dataframe= my_data[0:2])
	
	ones = DataFrame().create_dataframe(X.framesize[0],1,elm=1.)
	
	
	X = DataFrame(dataframe=DataFrame().concatenate(ones,X,axis=1))
	print(X)

Subtract two dataframe by same size

	my_data = DataFrame().read_csv('sample_inputs/home.txt',columns=["size","bedroom","price"])
	my_data.conv_type('float',change_self=True)
	X = DataFrame(dataframe= my_data[0:2])
	Y = DataFrame(dataframe= my_data[1:3])
	print(DataFrame().substract(X, Y))

n power of dataframe

	my_data = DataFrame().read_csv('sample_inputs/home.txt',columns=["size","bedroom","price"])
	my_data.conv_type('float',change_self=True)
	my_data.power(2)
	print(my_data)

Summation to rows or column of dataframe

	#adding rows
	liq = DataFrame().mat_dot( DataFrame(dataframe=[[1,4],[2,5],[3,6]]), DataFrame(dataframe=[[7,9,11],[8,10,12]]) )
	print(DataFrame(dataframe =  liq.tolist ))
	print(DataFrame(dataframe =  liq.tolist ).sum(axis=1))
	#adding columns
	liq = DataFrame().mat_dot( DataFrame(dataframe=[[1,4],[2,5],[3,6]]), DataFrame(dataframe=[[7,9,11],[8,10,12]]) )
	print(DataFrame(dataframe =  liq.tolist ))
	print(DataFrame(dataframe =  liq.tolist ).sum(axis=0))

Statistics on Single Column Data

	from RML_DataFrame import *
	from RML_Stat import *
	df3 = DataFrame().read_csv('sample_inputs/input2.txt')
	print(df3)
	print("ArithmeticMean = %s" % ArithmeticMean(conv_type(df3[0],"int")) )
	print("GeometricMean = %s" % GeometricMean(conv_type(df3[0],"int")))
	print("HarmonicMean = %s" % HarmonicMean(df3[0]))
	print("Mode = %s" % mode(df3[0]))
	print("Median = %s" % median(df3[0]))

Statistics on Class Distribution or Grouped Data

	from RML_DataFrame import *
	from RML_Stat import *
	df = DataFrame().read_csv('sample_inputs/input1.txt')
	print(df)
	lower, upper, frequency = conv_type(df['lower'],"int"),conv_type(df['upper'],"int"),conv_type(df['frequency'],"int")
	print("ArithmeticMean = %s" % ArithmeticMean(lower=lower,upper=upper,frequency=frequency))
	print("GeometricMean = %s" % GeometricMean(lower=lower,upper=upper,frequency=frequency))
	print("HarmonicMean = %s" % HarmonicMean(lower=lower,upper=upper,frequency=frequency))
	print("Mode = %s" % mode(lower=lower,upper=upper,frequency=frequency))
	print("Median = %s" % median(lower=lower,upper=upper,frequency=frequency))

Simple Linear Regression from textfile

	from RML_DataFrame import *
	from RML_ML import *
	dm = DataFrame()
	dm.read_csv('sample_inputs/matrice2.txt')
	linear_regression(conv_type(dm[0],'int'),conv_type(dm[1],'int'))

Simple Linear Regression from textfile by dataframe representation

	from RML_DataFrame import *
	from RML_ML import *
	dm = DataFrame().read_csv('sample_inputs/matrice2.txt')
	linear_regression(conv_type(dm[0],'int'),conv_type(dm[1],'int'))

Multivariant Linear Regression from textfile by dataframe representation

	my_data = DataFrame().read_csv('sample_inputs/home.txt',columns=["size","bedroom","price"])
	
	my_data.conv_type('float',change_self=True)
	my_data.normalize(change_self=True)
	XDATA = DataFrame(dataframe= my_data[0:2],columns=['size','bedroom'])
	YDATA = DataFrame(dataframe= [my_data[2]])
	
	multivariant_linear_regression(XDATA,YDATA,xreference=0,residual=0,xlabel='size',ylabel='price',title='Multivariant Linear regression')

Matrice dot multiplication from textfile

	dm1 = read_csv('sample_inputs/matrice.txt')
	dm2 = read_csv('sample_inputs/matrice3.txt')
	res = mat_dot(dm1,dm2)
	print(res)

Matrice dot multiplication from textfile by Dataframe representation

	from RML_DataFrame import *
	dm1 = DataFrame().read_csv('sample_inputs/matrice.txt')
	dm2 = DataFrame().read_csv('sample_inputs/matrice3.txt')
	print(  DataFrame().mat_dot(dm1,dm2)  )#creates another object
	print(  mat_dot(dm1.dataframe,dm2.dataframe)  )#efficient
	print(  mat_dot(dm1.tolist,dm2.tolist)  )#efficient

Matrice dot multiplication from matrice representation

	from RML_DataFrame import *
	dm1 = matrice( [[4,8],
		        [0,2],
	        	[1,6]] )
	dm2 = matrice( [[5,2],
		        [9,4]] )
	res = mat_dot(dm1,dm2)
	print(res)

Rectangle with Max, Min point

	from RML_DataFrame import *
	from RML_ML import *
	dm1 = DataFrame().read_csv('sample_inputs/youtube.txt')
	max_min_rectangle(dm1['length(s)'],dm1['views'])

For Repository Managers

Configs

git config --global user.name "shaon" git config --global user.email "smazoomder@gmail.com"

For adding the Remote Host

git remote add origin https://github.com/ShaonMajumder/rml.git

For sending updates from client to remote host

git add .
git commit -m "This a commit"
git push -u origin master

For receving updates from remote host to client

git pull origin master

Troubleshoots

No module Found due to root acces

sudo jupyter notebook --allow-root

Running the tests

Explain how to run the automated tests for this system

Break down into end to end tests

Explain what these tests test and why

Give an example

And coding style tests

Explain what these tests test and why

Give an example

Rules

Library names must starts with RML , other file can not be start as RML

Deployment

Add additional notes about how to deploy this on a live system

Built With

Dropwizard - The web framework used
Maven - Dependency Management
ROME - Used to generate RSS Feeds

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

Shaon Majumder - Github

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

Sabbir Amin - For guideline for mentoring
Inspiration
etc

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
prm		prm
sample_inputs		sample_inputs
LICENSE		LICENSE
README.md		README.md
RML_DataFrame.py		RML_DataFrame.py
RML_ML.py		RML_ML.py
RML_Stat.py		RML_Stat.py
RMl.py		RMl.py
examples.py		examples.py
jai.jpeg		jai.jpeg
main.py		main.py
project_update.bat		project_update.bat
rest.py		rest.py

Folders and files

Latest commit

History

Repository files navigation

Robist Machine Library

Getting Started

Prerequisites

Requirements

Installing

Function Manual

For Math

For Utility

For Sorting and Searching

For Iterable

For String

For DataFrame

Functions

Class : (1)DataFrame

Creating Classobject

Class Properties

Class methods

For Statistics

For Machine Learning

Functionality

Input and Output

Cautions

Example

For Repository Managers

Configs

For adding the Remote Host

For sending updates from client to remote host

For receving updates from remote host to client

Troubleshoots

Running the tests

Break down into end to end tests

And coding style tests

Rules

Deployment

Built With

Contributing

Versioning

Authors

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages