Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/dataset parser #6

Merged
merged 11 commits into from
May 25, 2021
Merged

Feature/dataset parser #6

merged 11 commits into from
May 25, 2021

Conversation

SAI-Aghylas
Copy link
Collaborator

No description provided.

# Conflicts:
#	src/main/scala/io/github/jsarni/CaraStage/CaraStage.scala
#	src/main/scala/io/github/jsarni/CaraStage/ModelStage/TestStage.scala
# Conflicts:
#	src/main/scala/io/github/jsarni/CaraStage/CaraStage.scala
#	src/main/scala/io/github/jsarni/CaraStage/ModelStage/LogisticRegression.scala
#	src/test/scala/io/github/jsarni/CaraStage/ModelStage/LogisticRegressionTest.scala
case _ : Any if field.getClass == Array[Array[Short]]().getClass => stage.getClass.getMethod(methodeName, field.asInstanceOf[Array[Array[Short]]].getClass )
case _ : Any if field.getClass == Array[Array[Char]]().getClass => stage.getClass.getMethod(methodeName, field.asInstanceOf[Array[Array[Char]]].getClass )
case _ : Any if field.getClass == Array[Array[Byte]]().getClass => stage.getClass.getMethod(methodeName, field.asInstanceOf[Array[Array[Byte]]].getClass )
case _ : Any if field.getClass == Array[Array[Long]]().getClass => stage.getClass.getMethod(methodeName, field.asInstanceOf[Array[Array[Long]]].getClass )
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ajouter des saut de lignes apres les fleches

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fait

@MapperConstructor
def this(params: Map[String, String]) = {
this(
params.get("InputCol").map(_.toString()),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enlever les toString quand on n'en n'a pas besoin

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fait

def build(): Try[PipelineStage]= Try{
val datasetFeature=new fromSparkML()
val definedFields = this.getClass.getDeclaredFields
.filter(f => f.get(this).asInstanceOf[Option[Any]].isDefined)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mettre sur une seule ligne ou mieux l'arranger

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fait

val values = definedFields.map(f => f.get(this))
val zipFields = names zip values

zipFields.map(f=> getMethode(datasetFeature,f._2 match {case Some(s) => s },f._1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arranger un peu tout ça

val values = definedFields.map(f => f.get(this))
val zipFields = names zip values

zipFields.map(f=> getMethode(datasetFeature,f._2 match {case Some(s) => s },f._1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pareil

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fait

import java.lang.NumberFormatException

class BinarizerTest extends TestBase {
"Binarizer build Success" should "build new binarizer with parametres given on the Map and be the same with SparkMl Binarizer" in {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arranger tout ça

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fait

stage.getClass.getMethod(methodeName, field.asInstanceOf[Array[Array[Byte]]].getClass )
case _ : Any if field.getClass == Array[Array[Long]]().getClass =>
stage.getClass.getMethod(methodeName, field.asInstanceOf[Array[Array[Long]]].getClass )
case _ : Any if field.getClass == Array[Boolean]().getClass => stage.getClass.getMethod(methodeName, field.asInstanceOf[Array[Boolean]].getClass )
case _ : Any if field.getClass == Array[Double]().getClass => stage.getClass.getMethod(methodeName, field.asInstanceOf[Array[Double]].getClass )
case _ : Any if field.getClass == Array[String]().getClass => stage.getClass.getMethod(methodeName, field.asInstanceOf[Array[String]].getClass )
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Faire la meme chose

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SAI-Aghylas SAI-Aghylas merged commit 3968405 into develop May 25, 2021
SAI-Aghylas added a commit that referenced this pull request Jul 3, 2021
* Added Unit Tests trait

* Added Unit Tests trait (#1)

* Started work on YAML Parser

* WIP - Yaml parser

* extract stages from yamls

* development of parser

* TAG: Parser with annotations

* yaml parser stages creation

* cleaned dataset stage

* fixed cara parser file

* Add LogisticRegression Class with building lr model

* Finalized LogisticRegression Class and move GetMethode function to the trait class CaraStage

* Merged LogisticRegressionStage and cleaned

* Buildin spark ml pipelines and unit tests

* Refactored CaraParser and added parse method + updated tests

* yaml_parser: update unit tests for CaraYaml

* Updated CaraParser adding try, update tests

* Feature/model schema (#3)

* LogisticRegressionTest contains error to clear

* Finalize LogisticRegression's class and  tests

* refactor names to caml case and correct spaces

Co-authored-by: merzouk <merzoukoumedda@gmail.com>

* Started model training

* Added Evaluator parser

* Evolution of parser

* Feature/dataset parser (#6)

* first implementation of HashingTF, IDF,Tokenizer,Word2Vec

* add new Dataset Features + Fix build function

* fix CountVectorizerModel + handle Model classes

* build edited, in progress for

* add tests for all classes -- must review CountVModel to fix tests

* fixed CountVectorizerModel Test

* getMethode completed + all class and tests ok + indentation ok

* fixed PR changes

* Feature/yaml parser (#7)

* Added Evaluator parser

* Evolution of parser

* Added tuner parser

* Feature/yaml parser (#8)

* Added Evaluator parser

* Evolution of parser

* Added tuner parser

* Added companion object to CaraParser

* Feature/yaml parser (#9)

* Added Evaluator parser

* Evolution of parser

* Added tuner parser

* Added companion object to CaraParser

* Added tuner to CaraPipeline

* skeleton for CaraModel

* Renamed CaraYaml class to CaraYaml Reader

* Created CaraModel Pipeline skeleton for train

* first commit branch

* finish generateModel method and add CaraModelTest class

* review cara_pipine_model test

* Feature/cara pipeline model (#10)

* first commit branch

* finish generateModel method and add CaraModelTest class

* review cara_pipine_model test

Co-authored-by: merzouk <merzoukoumedda@gmail.com>

* Changed datasetPath to dataset itself

* finilize class LinearRegression plus tests (#11)

Co-authored-by: merzouk <merzoukoumedda@gmail.com>

* Added evaluation method

* updated cara model

* Feature/model schema (#13)

* LogisticRegressionTest contains error to clear

* Finalize LogisticRegression's class and  tests

* refactor names to caml case and correct spaces

* Adjust LogisticRegretion format code and add DecisionTreeClassifier model class's and test's

* Add GBTClassifier model class's and tests

* tests not ended

* finilize tests new models classes

* CarastageMapper update

* update caraMapperModel

Co-authored-by: merzouk <merzoukoumedda@gmail.com>

* Feature/model schema (#15)

* LogisticRegressionTest contains error to clear

* Finalize LogisticRegression's class and  tests

* refactor names to caml case and correct spaces

* Adjust LogisticRegretion format code and add DecisionTreeClassifier model class's and test's

* Add GBTClassifier model class's and tests

* tests not ended

* finilize tests new models classes

* CarastageMapper update

* update caraMapperModel

* Add Kmeans, LDA and NaiveBayes models and class's tests

Co-authored-by: merzouk <merzoukoumedda@gmail.com>

* added MulticlassClassificationEvaluator (#16)

* Overwrite save

* Fixed the case where no tuner is specified

* Removed sparksession from CaraModel

* Feature/model schema (#19)

* LogisticRegressionTest contains error to clear

* Finalize LogisticRegression's class and  tests

* refactor names to caml case and correct spaces

* Adjust LogisticRegretion format code and add DecisionTreeClassifier model class's and test's

* Add GBTClassifier model class's and tests

* tests not ended

* finilize tests new models classes

* CarastageMapper update

* update caraMapperModel

* Add Kmeans, LDA and NaiveBayes models and class's tests

* add decisionTreeRegressor class and test's

* Add RandomForestRegressor class and test's

* Add GBTRegressor class and test's

Co-authored-by: merzouk <merzoukoumedda@gmail.com>

* Global refactoring (#20)

* Made build method for stages generic

* Code review on source code

* Added some scaladoc

* Started reviewing tests

* Refacto on unit tests

* Renamed packages

* Added father package

* Publish to repository

* Feature/readme documentation (#22)

* Set readme plan

* Update ReadMe

* Update README.md

* ReadMe Updates

* ReadMe updates

* updates ReadMe

* ReadMe updates

* Update README.md

* Updates ReadMe

* Updates ReadMe

* Update README.md

* ReadMe Updates

* ReadMe updates

* Update README.md

* Update README.md

* Add Schema

* Update README.md

* Update README.md

* Update README.md

* Update ReadMe add CaraML jar link

* update readme

Co-authored-by: merzouk <merzoukoumedda@gmail.com>

* fix ReadMe (#23)

Co-authored-by: merzouk <merzoukoumedda@gmail.com>

* Fix readme requirements (#24)

* update readme

* update readme

* update readme

* update readme

* update readme

Co-authored-by: merzouk <merzoukoumedda@gmail.com>

* Changed build version for release

* Feature/generate report (#28)

* generateReport fixed + modelEvaluate

* fixed Resources files

* Generate Report finished

* code clean generateReport

Co-authored-by: merzouk <merzoukoumedda@gmail.com>
Co-authored-by: SAI-Aghylas <55828644+SAI-Aghylas@users.noreply.github.com>
Co-authored-by: merzouk13 <57535044+merzouk13@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants