In [None]:
from subprocess import check_output
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('../input/dataset.csv')
df['Orbital Period (days)'] = df['Orbital Period (days)'].astype('float64')

**First lets see how the period of the planets orbit (how long it takes to orbit the sun), varies with the distance from the star.**

In [None]:
sns.set_style('ticks')
sns.lmplot(data=df, x='Orbit Semi-Major Axis (AU)', y='Orbital Period (days)',
          hue='Planet Radius (Jupiter radii)', palette='Blues',fit_reg=False)
plt.title('Distance vs Orbit')

>**As we move farther away from the stars gravitational pull, the orbital period of the planet increases, as predicted by planetary physics.**  The relationship is linear within the range observed.  More interesting is the fact that planet size seems to follow a bimodal pattern.    Below is a barplot of this effect.  Note that the letter 'a' is used to denote this system's star.

In [None]:
sns.barplot(data=df,y='Planet Radius (Jupiter radii)', x='Letter', 
            palette='muted',ci=df['Planet Radius Upper Unc. (Jupiter radii)'])

**Why?**  Are there any shielding effects?  Does the closest stars size prevent absorption into the sun?
The data is made into a scatter plot below, a parabolic fit seems plausible at first glance.  Yet planet h, the furthest from the star, would be an outlier in the fit.

In [None]:
sns.lmplot(data=df, x='Orbit Semi-Major Axis (AU)',  y='Planet Radius (Jupiter radii)',
           fit_reg=False)

**We will now observe mass and density, will they be bimodal as well?**. Note this data is missing for planet h.
 

In [None]:
sns.barplot(data=df,y='Planet Mass (Jupiter mass)', x='Letter', 
            palette='muted')

In [None]:
sns.barplot(data=df,y='Planet Density (g/cm**3)', x='Letter', 
            palette='muted',ci=df['Planet Density Upper Unc. (g/cm**3)'])

In [None]:
sns.pairplot(data=df, x_vars=['Orbit Semi-Major Axis (AU)'], y_vars=['Planet Mass (Jupiter mass)',
             'Planet Radius (Jupiter radii)','Planet Density (g/cm**3)'],hue='Letter',palette='muted')

**Overall, variance in mass is minimal.**  The variance in radius comes to explain the variance in density.  At first sight, a parabolic regression seems feasible for both variables.  I should do some research to understand the reasons for variation in planet sizes as a function of distance.