# Features construction

## Data transformation

* One-hot encoding 
* Feature hashing 
* Dracula (Distributed Robust Algorithms for Count-based Learning). 
    * Aggregate past data in a very simple data structure:  a set of count tables, where each table associates an attribute or combination with its historical counts for each label value (log-odds)
    * Caclucalete probabilities of log-odds
    * Treat these probabilities as features
    * Use additive smoothing and min-count cutoff to adjust probabilities and reduce the data size
    * For important features, calculate aggregate statistics (e.g. min, max, median, mean, majority category) andcreate the features out of them
* Feature interactions: 
    * Add
    * Substract
    * Multiply
    * Ratio
    * Clustering
    
## Representations
* Encoding dates: 
    * Number of days since a reference date 
    * Isolate second, minute, hour, day, month, day of year as separate features
    * Is it morning, weekend, holiday, Black Friday, Christmas, free from work, school day?
    * Handle the dates in user local time
    * Relate the date to external events
    * For some of these features you will require at least two periods to make training meaningful
* Periodic data
    * Decompose a time series into seasonal,trend and irregular components
    * Determine the time period 
    * Generate the features from lags up to atleast 2 x period
    * Rolling mean, min, max, etc. for the features (to keep track of the trend)
    * Rolling majority (or Dracula Count) for categorical features
    * Always test on future data
    * Estimate error as a function of time in the future
* Spatial Data
    * Spatial data encode locations, such as GPS coordinates, country, city names or zip codes
    * Use kriging or k-means clustering to obtain intermediate values 
    * Obtain local weather, financial information, etc. from zip codes, coordinates
* Text data
    * Cleaning
        * Lowercasing
        * Converting accented characters
        * Removing non-standard characters
    * Tokenizing
        * Punctuation marks
        * N-Grams
        * Skip-grams        
        * Char-grams
        * Affixes
    * Removing
        * Stop words
        * Rare words
        * Common words
    * Rooting
        * Spelling correction
        * Stemming
        * Lemmatization
        * Synonym detection
    * Enriching
        * Entity Insertion, Extraction
    

## Feature selection

* Filtering 
* Wrapping 
* Embedded 

![image](https://www.dropbox.com/s/yydr8a0pmahb6pz/Screenshot%202018-02-10%2019.26.59.png?dl=1)

## Model metrics

* Regression
    * RMSD.  The RMSD represents the sample standard deviation of the differences between predicted values and observed values.
    * $R^2$
    * Median deviation
* Classification
    * ROC-AUC
    * PR-AUC
    * Accuracy
    * Precision
    * Recall
![image](https://www.dropbox.com/s/b7biatwrjdjgc8d/Screenshot%202018-02-10%2019.34.43.png?dl=1)

## Monitoring Performance
* A/B Experiment