No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
orange
README.md
attribute.csv
horse-colic-reduce-attributes.csv
horse-colic.arff
horse-colic.csv
horse-colic.data
run weka.txt

README.md

Introduction

In practice, data quality is bad (missing and incomplete). After cleansing and transforming dataset, we can conduct data mining.

Blog: https://ongxuanhong.wordpress.com/2015/08/20/tien-xu-ly-du-lieu-horse-colic-dataset/

Exploratory analysis

Exploratory analysis

  • Number of instances: 300
  • Number of attributes: 28
  • Attribute information: attribute.csv Attribute information

Discretize and replace missing value

Using Weka filter Unsupervised/Attribute

  • Normalize: Normalizes all numeric values in the given dataset (apart from the class attribute, if set). The resulting values are by default in [0,1] for the data used to compute the normalization intervals. But with the scale and translation parameters one can change that, e.g., with scale = 2.0 and translation = -1.0 you get values in the range [-1,+1].
  • ReplaceMissingValue: Replaces all missing values for nominal and numeric attributes in a dataset with the modes and means from the training data.
  • Discretize: An instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes. Discretization is by simple binning. Skips the class attribute if set.
  • NumericToNominal: A filter for turning numeric attributes into nominal ones. Unlike discretization, it just takes all numeric values and adds them to the list of nominal values of that attribute. Useful after CSV imports, to enforce certain attributes to become nominal, e.g., the class attribute, containing values from 1 to 5.