# Available datasets

As mentionned in the [Reading data](reading-data) section, `creme` makes available some datasets to play around with:

In [1]:
from creme import stream

stream.available_datasets()

['AirlinePassengers',
 'Bananas',
 'Bikes',
 'ChickWeights',
 'CreditCard',
 'Elec2',
 'HTTP',
 'Higgs',
 'ImageSegments',
 'Insects',
 'MaliciousURL',
 'MovieLens100K',
 'Music',
 'Phishing',
 'Restaurants',
 'SMSSpam',
 'SMTP',
 'SolarFlare',
 'TREC07',
 'Taxis',
 'TrumpApproval']

## Regression

In [10]:
?filter

[0;31mInit signature:[0m [0mfilter[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
filter(function or None, iterable) --> filter object

Return an iterator yielding those items of iterable for which function(item)
is true. If function is None, return the items that are true.
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [19]:
def print_datasets(task):
    datasets = filter(lambda dataset: dataset.task == task, (
        stream.iter_dataset(name)
        for name in stream.available_datasets()
    ))
    print(f"\n\n{'-' * 20}\n\n".join(map(str, datasets)))
            
print_datasets('Regression')

AirlinePassengers dataset

              Task  Regression                                                                   
 Number of samples  144                                                                          
Number of features  1                                                                            
            Sparse  False                                                                        
              Path  /Users/mhalford/projects/creme-ml/creme/creme/datasets/airline-passengers.csv

--------------------

Bikes dataset

              Task  Regression                                                    
 Number of samples  182,470                                                       
Number of features  8                                                             
            Sparse  False                                                         
              Path  /Users/mhalford/creme_data/Bikes/toulouse_bikes.csv           
               URL  https://ma

## Binary classification

In [20]:
print_datasets('Binary classification')

Bananas dataset

              Task  Binary classification                                            
 Number of samples  5,300                                                            
Number of features  2                                                                
            Sparse  False                                                            
              Path  /Users/mhalford/projects/creme-ml/creme/creme/datasets/banana.zip

--------------------

CreditCard dataset

              Task  Binary classification                                          
 Number of samples  150,828,752                                                    
Number of features  30                                                             
            Sparse  False                                                          
              Path  /Users/mhalford/creme_data/CreditCard/creditcard.csv           
               URL  https://maxhalford.github.io/files/datasets/creditcardfraud.zip
      

## Multi-class classification

In [21]:
print_datasets('Multi-class classification')

ImageSegments dataset

              Task  Multi-class classification                                            
 Number of samples  2,310                                                                 
Number of features  18                                                                    
            Sparse  False                                                                 
              Path  /Users/mhalford/projects/creme-ml/creme/creme/datasets/segment.csv.zip

--------------------

Insects dataset, abrupt_balanced variant

              Task  Multi-class classification                                                              
 Number of samples  52,848                                                                                  
Number of features  33                                                                                      
 Number of classes  6                                                                                       
            Sparse  F

Note that the `'Insects'` dataset has multiple variants:

In [5]:
insects = stream.iter_dataset('Insects')
insects.variants

['abrupt_balanced',
 'abrupt_imbalanced',
 'gradual_balanced',
 'gradual_imbalanced',
 'incremental-abrupt_balanced',
 'incremental-abrupt_imbalanced',
 'incremental-reoccurring_balanced',
 'incremental-reoccurring_imbalanced',
 'incremental_balanced',
 'incremental_imbalanced',
 'out-of-control']

You can load a particular variant by passing a keyword argument to `iter_dataset`: 

In [8]:
dataset = stream.iter_dataset('Insects', variant='abrupt_imbalanced')
dataset

Insects dataset, abrupt_imbalanced variant

              Task  Multi-class classification                                                                
 Number of samples  355,275                                                                                   
Number of features  33                                                                                        
 Number of classes  6                                                                                         
            Sparse  False                                                                                     
              Path  /Users/mhalford/creme_data/Insects/INSECTS-abrupt_imbalanced_norm.arff                    
               URL  http://sites.labic.icmc.usp.br/vsouza/repository/creme/INSECTS-abrupt_imbalanced_norm.arff
              Size  104.95 MB                                                                                 
        Downloaded  False                                           

## Multi-output binary classification

In [22]:
print_datasets('Multi-output binary classification')

Music dataset

              Task  Multi-output binary classification                                                    
 Number of samples  593                                                                                   
Number of features  72                                                                                    
 Number of outputs  6                                                                                     
            Sparse  False                                                                                 
              Path  /Users/mhalford/creme_data/Music/music.csv                                            
               URL  https://raw.githubusercontent.com/scikit-multiflow/streaming-datasets/master/music.csv
              Size  370.1 KB                                                                              
        Downloaded  True                                                                                  


## Multi-output regression

In [23]:
print_datasets('Multi-output regression')

SolarFlare dataset

              Task  Multi-output regression                                                   
 Number of samples  1,066                                                                     
Number of features  10                                                                        
 Number of outputs  3                                                                         
            Sparse  False                                                                     
              Path  /Users/mhalford/projects/creme-ml/creme/creme/datasets/solar-flare.csv.zip
