Modeling species’ geographic distributions is an important problem in conservation biology. In this example we model the geographic distribution of two south american mammals given past observations and 14 environmental variables. Since we have only positive examples (there are no unsuccessful observations), we cast this problem as a density estimation problem and use the OneClassSVM provided by the package sklearn.svm as our modeling tool. The dataset is provided by Phillips et. al. (2006).
The two species are:

* “[Bradypus variegatus](http://www.iucnredlist.org/details/3038/0)” , the Brown-throated Sloth.
* “[Microryzomys minutus](http://www.iucnredlist.org/details/13408/0)” , also known as the Forest Small Rice Rat, a rodent that lives in Peru, Colombia, Ecuador, Peru, and Venezuela.

### References

* “[Maximum entropy modeling of species geographic distributions](http://www.cs.princeton.edu/~schapire/papers/ecolmod.pdf)” S. J. Phillips, R. P. Anderson, R. E. Schapire - Ecological Modelling, 190:231-259, 2006.

#### New to Plotly?
Plotly's Python library is free and open source! [Get started](https://plot.ly/python/getting-started/) by downloading the client and [reading the primer](https://plot.ly/python/getting-started/).
<br>You can set up Plotly to work in [online](https://plot.ly/python/getting-started/#initialization-for-online-plotting) or [offline](https://plot.ly/python/getting-started/#initialization-for-offline-plotting) mode, or in [jupyter notebooks](https://plot.ly/python/getting-started/#start-plotting-online).
<br>We also have a quick-reference [cheatsheet](https://images.plot.ly/plotly-documentation/images/python_cheat_sheet.pdf) (new!) to help you get started!

### Version

In [1]:
import sklearn
sklearn.__version__

'0.18'

### Imports

This tutorial imports [fetch_species_distributions](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_species_distributions.html#sklearn.datasets.fetch_species_distributions).

In [2]:
import plotly.graph_objs as go
import plotly.plotly as py

from __future__ import print_function

from time import time
import numpy as np

from sklearn.datasets.base import Bunch
from sklearn.datasets import fetch_species_distributions
from sklearn.datasets.species_distributions import construct_grids
from sklearn import svm, metrics

print(__doc__)

Automatically created module for IPython interactive environment


### Calculaions

In [3]:

def create_species_bunch(species_name, train, test, coverages, xgrid, ygrid):
    """Create a bunch with information about a particular organism

    This will use the test/train record arrays to extract the
    data specific to the given species name.
    """
    bunch = Bunch(name=' '.join(species_name.split("_")[:2]))
    species_name = species_name.encode('ascii')
    points = dict(test=test, train=train)

    for label, pts in points.items():
        # choose points associated with the desired species
        pts = pts[pts['species'] == species_name]
        bunch['pts_%s' % label] = pts

        # determine coverage values for each of the training & testing points
        ix = np.searchsorted(xgrid, pts['dd long'])
        iy = np.searchsorted(ygrid, pts['dd lat'])
        bunch['cov_%s' % label] = coverages[:, -iy, ix].T

    return bunch

### Plotting

In [5]:

final_plot_data=[]
final_plot_layout=[]
name = []
name.append("Bradypus variegatus")
name.append("Microryzomys minutus")

species=("bradypus_variegatus_0","microryzomys_minutus_0")

"""
Plot the species distribution.
"""

if len(species) > 2:
    print("Note: when more than two species are provided,"
          " only the first two will be used")

t0 = time()

# Load the compressed data
data = fetch_species_distributions()

# Set up the data grid
xgrid, ygrid = construct_grids(data)

# The grid in x,y coordinates
X, Y = np.meshgrid(xgrid, ygrid[::-1])

# create a bunch for each species
BV_bunch = create_species_bunch(species[0],
                                data.train, data.test,
                                data.coverages, xgrid, ygrid)
MM_bunch = create_species_bunch(species[1],
                                data.train, data.test,
                                data.coverages, xgrid, ygrid)

# background points (grid coordinates) for evaluation
np.random.seed(13)
background_points = np.c_[np.random.randint(low=0, high=data.Ny,
                                            size=10000),
                          np.random.randint(low=0, high=data.Nx,
                                            size=10000)].T

# We'll make use of the fact that coverages[6] has measurements at all
# land points.  This will help us decide between land and water.
land_reference = data.coverages[6]

# Fit, predict, and plot for each species.
for i, species in enumerate([BV_bunch, MM_bunch]):
    print("_" * 80)
    print("Modeling distribution of species '%s'" % species.name)

    # Standardize features
    mean = species.cov_train.mean(axis=0)
    std = species.cov_train.std(axis=0)
    train_cover_std = (species.cov_train - mean) / std

    # Fit OneClassSVM
    print(" - fit OneClassSVM ... ", end='')
    clf = svm.OneClassSVM(nu=0.1, kernel="rbf", gamma=0.5)
    clf.fit(train_cover_std)
    print("done.")
       
    print(" - predict species distribution")

    # Predict species distribution using the training data
    Z = np.ones((data.Ny, data.Nx), dtype=np.float64)

    # We'll predict only for the land points.
    idx = np.where(land_reference > -9999)
    coverages_land = data.coverages[:, idx[0], idx[1]].T

    pred = clf.decision_function((coverages_land - mean) / std)[:, 0]
    Z *= pred.min()
    Z[idx[0], idx[1]] = pred

    levels = np.linspace(Z.min(), Z.max(), 25)
    Z[land_reference == -9999] = -9999
   
    data1 = [
        dict(
            lat = species.pts_train['dd lat'] ,
            lon = species.pts_train['dd long'],
            marker = dict(
                    color ='red',
                    size=5),
            name="Train",
            type = 'scattergeo') ,
        dict(
            lat = species.pts_test['dd lat'] ,
            lon = species.pts_test['dd long'],
            marker = dict(
                    color = 'green',
                    size=5 ),
            type = 'scattergeo',
            name="Test") 
    ]
    
    final_plot_data.append(data1)

    layout = dict(
        title=name[i],
        height=700,
        geo = dict(
            scope = 'south america',
            showland = True,
            landcolor = "rgb(255, 240, 225)",
            showlakes = True,
            lakecolor = "rgb(255, 255, 255)",
            showcountries = True,
             projection = dict(
                type = 'conic conformal',
                rotation = dict(
                    lon = -100)),
            lonaxis = dict(
                showgrid = False),
            lataxis = dict (
                showgrid = False),
        ))
    
    final_plot_layout.append(layout)
    
    # Compute AUC with regards to background points
    pred_background = Z[background_points[0], background_points[1]]
    pred_test = clf.decision_function((species.cov_test - mean)
                                      / std)[:, 0]
    scores = np.r_[pred_test, pred_background]
    y = np.r_[np.ones(pred_test.shape), np.zeros(pred_background.shape)]
    fpr, tpr, thresholds = metrics.roc_curve(y, scores)
    roc_auc = metrics.auc(fpr, tpr)
   
    print("\n Area under the ROC curve : %f" % roc_auc)

print("\ntime elapsed: %.2fs" % (time() - t0))

________________________________________________________________________________
Modeling distribution of species 'bradypus variegatus'
 - fit OneClassSVM ... done.
 - predict species distribution

 Area under the ROC curve : 0.868443
________________________________________________________________________________
Modeling distribution of species 'microryzomys minutus'
 - fit OneClassSVM ... done.
 - predict species distribution

 Area under the ROC curve : 0.993919

time elapsed: 3.80s


In [6]:
fig = {'data':final_plot_data[0], 'layout':final_plot_layout[0]}
py.iplot(fig)

In [7]:
fig = { 'data':final_plot_data[1], 'layout':final_plot_layout[1] }
py.iplot(fig)

### License

 Authors:
 
          Peter Prettenhofer <peter.prettenhofer@gmail.com>
          Jake Vanderplas <vanderplas@astro.washington.edu>

 License: 
  
          BSD 3 clause

In [None]:
from IPython.display import display, HTML

display(HTML('<link href="//fonts.googleapis.com/css?family=Open+Sans:600,400,300,200|Inconsolata|Ubuntu+Mono:400,700" rel="stylesheet" type="text/css" />'))
display(HTML('<link rel="stylesheet" type="text/css" href="http://help.plot.ly/documentation/all_static/css/ipython-notebook-custom.css">'))

! pip install git+https://github.com/plotly/publisher.git --upgrade
import publisher
publisher.publish(
    'species-distribution.ipynb', 'scikit-learn/plot-species-distribution-modeling/', 'Species Distribution Modeling | plotly',
    '',
    title = 'Species Distribution Modeling | plotly',
    name = 'Species Distribution Modeling',
    has_thumbnail='true', thumbnail='thumbnail/species.jpg', 
    language='scikit-learn', page_type='example_index',
    display_as='real_dataset', order=7,
    ipynb='~Diksha_Gabha/2672')  

Collecting git+https://github.com/plotly/publisher.git
  Cloning https://github.com/plotly/publisher.git to /tmp/pip-ufkEQ2-build
