Weka package that takes advantage of the tablesaw Java dataframe and visualization library.
Use bootstrapp to download all the dependencies (adjust version of tablesaw if necessary):
java -jar bootstrapp-0.1.5-spring-boot.jar --dependency tech.tablesaw:tablesaw-core:0.38.1 --output_dir ./out
Copy all jars from ./out/target/lib
into the projects lib
directory.
-
Loader
TableSawCsvLoader
- for loading CSV files
-
Saver
TableSawCsvSaver
- for saving CSV files
The weka.filters.Tablesaw
filter allows you to apply the following table
operations:
CountBy
- generates a table with two columns, the first with the name of the categorical value and the second with the count for that value.First
- returns the first X number of rowsLast
- returns the last X number of rowsMissingValueCounts
- counts the missing values (outputs single row)MultiTableOperation
- applies all specified table operations sequentiallyPassThrough
- dummy, just passes through the dataRemoveColumns
- removes specified columnsRetainColumns
- keeps specified columnsRemoveColumnsWithMissingValues
- drops columns with missing valuesRemoveRowsWithMissingValues
- drops rows with missing valuesSampleN
- generates a sub-sample of size NSampleSplit
- splits data into two and returns either first or second partSampleX
- generates a sub-sample of proportion X (0.0-1.0)Sort
- sorts the data using specified columns (ascending or descending)Summary
- generates a summary for the specified column
Note: Since the data needs to be converted into Tablesaw's dataframe format,
the data may still get modified (e.g., change in attribute types), despite the
PassThrough
operation being selected.
Use the following dependency in your pom.xml
:
<dependency>
<groupId>com.github.fracpete</groupId>
<artifactId>tablesaw-weka-package</artifactId>
<version>2021.3.3</version>
<type>jar</type>
<exclusions>
<exclusion>
<groupId>nz.ac.waikato.cms.weka</groupId>
<artifactId>weka-dev</artifactId>
</exclusion>
</exclusions>
</dependency>
For more information on how to install the package, see: