# Data
## A journey from a cluster of pulsars to the whole sky
## And from megabytes to exabytes
### Natasha Hurley-Walker
### 20th March 2018

<img src=media/NHW_AMI.jpg width=1024>
#### Arcminute Microkelvin Imager, Cambridge, UK

<img src=media/NHW_Beta_receiver.JPG width=600>
#### Commissioning the Murchison Widefield Array, Murchison, WA

<img src=media/NHW_MWA.JPG width=1024>
#### Commissioning the Murchison Widefield Array, Murchison, WA

<img src=media/NHW_TEDx.jpg width=1024>
#### TEDx Perth 2016

# How did I get to work on radio telescopes?

## It started with a summer project...

<img src=media/JBO_Lovell.jpg width=1024>
#### Lovell telescope, Jodrell Bank Observatory, UK

<img src=media/Pulsar_Group-16-02-05.jpg width=1024>
#### JBO Pulsar Group 2005

<img src=media/Parkes.jpg width=1024>
#### Parkes Radio Telescope, NSW

<video controls loop=1 autoplay=1 src=media/"lighthouse.m4v" width=1024 />
#### A rotating pulsating neutron star: a pulsar

<img src=media/Globular_cluster_47_Tucanae.jpg width=500>
#### Globular cluster 47 Tucanae

<img src=media/DLT.jpg width=600>
#### My first big data project

In [1]:
import plotly.plotly as py
import plotly.graph_objs as go
# Might need to do this if no internet connection
# py.offline.init_notebook_mode(connected=True)
# py.offline.iplot

# Create random data with numpy
import numpy as np

N = 500
Amplitude = 0.2
random_x = np.linspace(0, 1, N)
random_y = Amplitude*np.random.randn(N)

In [2]:
# Hardcode a signal

random_y[50] += 0.5
random_y[51] += 1.0
random_y[52] += 0.5

random_y[250] += 0.5
random_y[251] += 1.0
random_y[252] += 0.5

random_y[450] += 0.5
random_y[451] += 1.0
random_y[452] += 0.5

In [3]:
# Plot
trace = go.Scatter(
    x = random_x,
    y = random_y
)

data = [trace]

layout = dict(autosize=False,
              width=1024,
              height=500)

py.iplot(dict(data=data, layout=layout), filename='basic-line')

# Computers need to do the heavy lifting
## In this case, split up the data by multiple time intervals
## Stack them up
## At the rotation period of the pulsar, all the pulses will align
## A signal will pop out!

<img src=media/pulses.jpg>

<img src=media/Joy_Division.jpg width=400>

# If a computer can do all the work, why do we still need astronomers?

In [4]:
import plotly.plotly as py
import plotly.graph_objs as go

# Add data
month = ['January', 'February', 'March', 'April', 'May', 'June', 'July',
         'August', 'September', 'October', 'November', 'December']
monthly_avg_temp = np.array([ 23.15,  29.95,  41.2 ,  45.1 ,  59.5 ,  65.75,  67.1 ,  67.45,
        60.95,  51.7 ,  38.35,  22.6 ])

# Convert Farenheight to Celsius
def FtoC(F):
    C = (F - 32) * (5./9.)
    return C

monthly_avg_temp = FtoC(monthly_avg_temp)

In [5]:
# Create and style traces
trace = go.Scatter(
    x = month,
    y = monthly_avg_temp,
    name = 'Monthly average temperature',
    line = dict(
        color = ('rgb(205, 12, 24)'),
        width = 4)
)

data = [trace]

# Edit the layout
layout = dict(title = 'Monthly average temperatures in New York',
              xaxis = dict(title = 'Month'),
              yaxis = dict(title = 'Temperature (C)'),
              autosize=False,
              width=1024,
              height=500
              )

In [6]:
fig = dict(data=data, layout=layout)
py.iplot(fig, filename='styled-line')

## What can a computer tell us?

In [7]:
# The average monthly temperature
print np.average(monthly_avg_temp)

8.74074074074


In [8]:
# The standard deviation of temperatures
print np.std(monthly_avg_temp)

8.93970093143


In [9]:
# The hottest month
print np.max(monthly_avg_temp)

19.6944444444


In [10]:
# The coldest month
print np.min(monthly_avg_temp)

-5.22222222222


## Lies, damned lies, and statistics

In [11]:
fig2 = py.get_figure("https://plot.ly/~nhurleywalker/9/")

In [12]:
fig2.layout.autosize=False
fig2.layout.width=1024
fig2.layout.height=800

In [13]:
py.iplot(fig2)

#### Computers can't "see" the difference!

# A human view
<img src=media/Search_Results.png width=1024>

In [14]:
trace1 = go.Bar(
    x=["Masters", "PhD", "MWACS", "GLEAM", "GLEAM-X"],
    y=[3.5, 2.6, 5.0, 1000.0, 10000.0],
    name = "Raw data"
)
trace2 = go.Bar(
    x=["Masters", "PhD", "MWACS", "GLEAM", "GLEAM-X"],
    y=[.000100, .000050, 0.001000, 0.070000, 1.000000],
    name = "Final data"
)


data1 = [trace1, trace2]

# Edit the layout
layout1 = go.Layout(
    barmode='grouped',
    title = 'Data used over my professional career',
              xaxis = dict(title = 'Project'),
              yaxis = dict(title = 'Data size (TB)', type='log'),
              autosize=False,
              width=1000,
              height=750
)


In [15]:
fig1 = go.Figure(data=data1, layout=layout1)
py.iplot(fig1)

# A flood of data

# What do we do??

## Citizen science...

In [16]:
from IPython.display import IFrame
IFrame("http://gleamoscope.icrar.org/gleamoscope/trunk/src/?w=2.2&l=355.5&b=-1&z=3",1024,600)

## Machine learning...
<img src=media/ML.jpeg height=400>

### More this
<img src=media/robots_at_desks.jpg width=1024>

### Less this
<img src=media/phd_brain.gif width=1024>

# Questions?

## Plus: Download the app from the Google Play store: search for "GLEAM". Or:
<img src=media/QR_GLEAM.png width=150>
### Plus plus: Watch the TED talk!
<a href="https://www.ted.com/talks/natasha_hurley_walker_how_radio_telescopes_show_us_unseen_galaxies">https://www.ted.com/talks/<br>natasha_hurley_walker_how_radio_telescopes_show_us_unseen_galaxies</a>
#### Plus plus plus: Download this talk and play with the data:
<a href="https://github.com/nhurleywalker/2018-03-20-SciTech-Data">https://github.com/nhurleywalker/2018-03-20-SciTech-Data</a>

