# Building a Possum Regression and Classification Model
*By Stephen FitzSimon*

In [1]:
import pandas as pd
import acquire



## Contents <a name='contents'></a>

1. <a href='#introduction'>Introduction</a>
2. <a href='#acquire'>Acquire the Data</a>
3. <a href='#explore'>Explore the Data</a>
4. <a href='#model'>Model the Data</a>
5. <a href='#conclusion'>Conclusion</a>

<img src='https://upload.wikimedia.org/wikipedia/commons/e/e9/Trichosurus_caninus_Gould.jpg'></img>

## Introduction <a name='introduction'></a>

1. <a href='#sources'>Sources</a>
2. <a href='#about_data'>About The Data</a>

The Bushtail possum is a native Australian possum found along the East coast of the continent.  The following data was collected in 1995 by Lindenmayer; at this point in time it was classified with the cloesly related <a href='https://en.wikipedia.org/wiki/Mountain_brushtail_possum'>Mountain Brushtail Possum (*Trichosurus cunninghami*)</a>.  As a member of the Trichosurus tribe, they are considered more at home on the ground than other members of the Phalangeridae family, yet they remain predominately leaf eaters.

The goal of this project is to explore the anatomical characteristics of the species and develop a linear regression model to predict an individual's age, and a classification model to predict an individual's sex. 

#### More Information On The Species and The Phalangeridae Family

- <a href='https://en.wikipedia.org/wiki/Short-eared_possum'>Species information on Wikipedia</a>

- <a href='https://en.wikipedia.org/wiki/Mountain_brushtail_possum'>Wikipedia informaton on the closely related Mountain brushtail possum</a> (note: before 2002 the two species were thought to be a single species)

- <a href='https://en.wikipedia.org/wiki/Phalangeridae'>Wikipedia information on the Phalangeridae family</a>

- <a href='https://www.theage.com.au/national/a-tail-of-two-possums-20041203-gdz4bq.html'>A tail of two possums - The Age (Melbourne)</a>

- <a href='https://www.iucnredlist.org/species/40557/21951945'>Conservation information at Red List</a>

- <a href='https://www.departments.bucknell.edu/biology/resources/msw3/browse.asp?s=y&id=11000086'>Entry at Mammal Species of the World</a>

- <a href='https://www.youtube.com/watch?v=Cwg2rTorJWc'>Video by Brave Wilderness on the Related Bushtail Possum</a>

### Sources <a name='sources'></a>

*Original Paper*

Lindenmayer DB , Viggers KL , Cunningham RB Donnelly CF (1995) Morphological Variation Among Populations of the Mountain Brushtail Possum, Trichosurus-Caninus Ogilby (Phalangeridae, Marsupialia). *Australian Journal of Zoology* 43, 449-458. https://doi.org/10.1071/ZO9950449

*Kaggle Dataset*

https://www.kaggle.com/datasets/abrambeyer/openintro-possum

### About the Data <a name='#about_data'></a>

*Note: original column names can be found on the kaggle page for the data.  The column names made in the `acquire.py` module are used for the data dictionary.  The information to clean up the data can be found either in the original paper by Lindenmayer or from the documentation on the <a href='https://cran.r-project.org/web/packages/DAAG/index.html'>DAAG dataset on CRAN</a>*

- `case` : observation/identification number of individual
- `trap_site` : the id number of the site where the individual was trapped; they are as follows:
    - Cambarville, Victoria
    - Bellbird, Victoria
    - Whian Whian State Forest, NSW
    - Byrangery Reserve, NSW
    - Conondale Ranges, Queensland
    - Bulburin State Forest, Queensland
    - Allyn River Forest Park, NSW
- `state` : the Australian state of the `trap_site` location 
- `sex` : the sex of the individual
- `age` : the age of the individual in years, determined by tooth wear (Lindenmayer)
- `head_length` : length of the head from the nose tip to the external occipital protuberance in mm
- `skull_width` : the width of the skull at the widest part in mm
- `total_length` : length of the body from the nose tip to the tain end in mm
- `tail_length` : length from tail base to tail tip in mm
- `foot_length` : length from heel to longest toe's tip in mm
- `ear_length` : length from the base of the ear to the tip of the ear
- `eye_width` : the width of the eye from medial to lateral canthus
- `chest_girth` : girth behind the forelimbs in mm
- `belly_girth` : girth behind the last rib in mm

<a href='#contents'>Back to Contents</a>

## Acquire The Data <a name='acquire'></a>

In [2]:
df = acquire.make_dataset()

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 104 entries, 0 to 103
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   case          104 non-null    int64  
 1   trap_site     104 non-null    object 
 2   state         104 non-null    object 
 3   sex           104 non-null    object 
 4   age           102 non-null    float64
 5   head_length   104 non-null    float64
 6   skull_width   104 non-null    float64
 7   total_length  104 non-null    float64
 8   tail_length   104 non-null    float64
 9   foot_length   103 non-null    float64
 10  ear_length    104 non-null    float64
 11  eye_width     104 non-null    float64
 12  chest_girth   104 non-null    float64
 13  belly_girth   104 non-null    float64
dtypes: float64(10), int64(1), object(3)
memory usage: 11.5+ KB


In [4]:
df.state.value_counts(dropna=False)

Victoria           46
New South Wales    32
Queensland         26
Name: state, dtype: int64

In [5]:
df[df.age.isna()]

Unnamed: 0,case,trap_site,state,sex,age,head_length,skull_width,total_length,tail_length,foot_length,ear_length,eye_width,chest_girth,belly_girth
43,44,Bellbird,Victoria,m,,85.1,51.5,760.0,355.0,70.3,52.6,14.4,230.0,270.0
45,46,Bellbird,Victoria,m,,91.4,54.4,840.0,350.0,72.8,51.2,14.4,245.0,350.0
