-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to create features dynamically. #228
Comments
I have also tried this example where Feature engineering is done automatically. Kindly provide the solution to achieve this functionality. |
Yes, we have a CSV reader that allows inferring schema automatically. See CSVAutoReader. Here is an example way to use it: val autoReader = new CSVAutoReader[GenericRecord](readPath, _.get("id").toString) Notes:
Please let me know if it works for you. |
Hi thanks for help, I am trying to to read with CSVAutoReader for that I replace val passengersData = DataReaders.Simple.csvCase[Passenger](pathToData, key = _.id.toString).readDataset()(spark, newProductEncoder).toDF() with val passengersData = new CSVAutoReader[GenericRecord](pathToData, key = _.get("id").toString, headers =
Seq("id", "survived", "pClass", "name", "sex", "age", "sibSp", "parCh", "ticket", "fare", "cabin", "embarked")).read()(spark) in this example. but next step val (survived, features) = FeatureBuilder.fromDataFrame[RealNN](passengersData, response = "survived")
val featureVector = features.transmogrify() creates the features from dataframe but in CSVAutoReader read method is not convertible to Dataframe. It will be great if you can provide the working example for this, as I am not able to find on provided hello-world examples. |
In general readers only allow reading typed data in an RDD using What I understand from your case is that you want to have both feature definitions and data reader be created automatically. The only way to currently allow that is to use our cli codegen tool as explained here. You can try using |
Yes So, I would like to raise a feature request, in which we can convert spark Dataframe to TransmogrifAI Dataframe which we can directly pass to val (survived, features) = FeatureBuilder.fromDataFrame[RealNN](transmogrifConvertedDFfromSpark, response = "survived") and to the workflow val model = new OpWorkflow().setInputDataset(transmogrifConvertedDFfromSpark).setResultFeatures(prediction).train()(spark) |
|
Here is the snapshot of the code I am trying to run using spark dataframe. (I have added column names to the CSV file) val df = spark.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load(pathToData)
val (survived, features) = FeatureBuilder.fromDataFrame[RealNN](df, response = "survived") df is the spark dataframe & I get the following error. (I am assuming the primitive data type 'RealNN' is implemented because according the error below, Transmogrifai is not converting integral type to RealNN type)
|
You would need to apply a column transformation on dataframe using |
Yes Now its running, Thanks @tovbinm One more thing I want to know In case of multi-class classification Eg ("a","b","c") 3 classes, I want to follow same procedure, There should I provide string indexes as response Eg (1,2,3) or should it be one-hot encoded Eg ([1,0,0], [0,1,0], [0,0,1]).
|
We don't modify the label automatically and yes, you should apply indexer on the response feature for multiclass. E.g. val response: FeatureLike[PickList] = ...
val indexed: FeatureLike[RealNN] = response.indexed() |
Describe the bug
I was trying to run the sample examples given here. In examples we need to define schema file (Case class) for the data supplied to Transmogrifai reader Eg: In Boston Housing price. But, I would like to know, if I can work with Transmogrifai without defining the schema for the data (Similar to inferSchema in Spark), which automatically infers the schema from the data provided example from CSV.
To Reproduce
Minimal set of steps or code snippet to reproduce the behavior
Expected behavior
I just need to provide the CSV file or any other file, it should create data-frame by inferring schema and run the algorithms on top of that.
Logs or screenshots
If applicable, add logs or screenshots to help explain your problem.
Additional context
I know there is functionality where we can use Avro schema to create Schema class which generates java class for schema, but again we need to define FeatureBuilders on top of that Eg: Iris data multi-classification.
Can it be done without defining these schema i.e By inferring schema & create feature builder automatically and run the algorithms on top of that.
Example of inferSchema from csv file in Spark Scala.
The text was updated successfully, but these errors were encountered: