# METAR Data Analysis Project

METAR is the acronym for Meteorological Aerodrome Report. It is internationally recognized shorthand for weather data used by the aviation community.

### The Problem:

A Flight school is looking to expand to new locations. You're given METAR data to identify the 10 best and 10 worst locations based on the following criteria:

* Visibility of 10 statute miles or greater 

* Cloud ceiling of 3,000 ft above ground or higher

* Winds less than 15 kts

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
metar = 'metar_export.txt'

In [3]:
df = pd.read_csv(metar, delimiter='\t')

In [4]:
df.head()

Unnamed: 0,2015-03-25 21:15:00,KCXP 252115Z AUTO 08005KT 10SM CLR 17/M01 A3033 RMK AO2
0,2015-03-25 21:13:00,KWRB 252113Z AUTO 06005KT 10SM BKN014 OVC024 2...
1,2015-03-25 21:12:00,KTUL 252112Z 14006KT 10SM TS BKN035 BKN120 BKN...
2,2015-03-25 21:11:00,KDRT 252111Z AUTO 13007KT 10SM SCT023 22/17 A2...
3,2015-03-25 21:10:00,KBRL 252110Z AUTO 33009KT 8SM SCT016 07/03 A30...
4,2015-03-25 21:10:00,KCMX 252110Z AUTO 26016KT 4SM -SN BR OVC007 01...


In [5]:
df

Unnamed: 0,2015-03-25 21:15:00,KCXP 252115Z AUTO 08005KT 10SM CLR 17/M01 A3033 RMK AO2
0,2015-03-25 21:13:00,KWRB 252113Z AUTO 06005KT 10SM BKN014 OVC024 2...
1,2015-03-25 21:12:00,KTUL 252112Z 14006KT 10SM TS BKN035 BKN120 BKN...
2,2015-03-25 21:11:00,KDRT 252111Z AUTO 13007KT 10SM SCT023 22/17 A2...
3,2015-03-25 21:10:00,KBRL 252110Z AUTO 33009KT 8SM SCT016 07/03 A30...
4,2015-03-25 21:10:00,KCMX 252110Z AUTO 26016KT 4SM -SN BR OVC007 01...
...,...,...
7731235,2016-04-28 18:16:00,KWRI 281816Z AUTO 10007KT 10SM -RA FEW048 OVC0...
7731236,2016-04-28 18:15:00,KUES 281815Z 07012G16KT 10SM OVC008 03/02 A2997
7731237,2016-04-28 18:14:00,KVPS 281814Z 20011KT 10SM BKN012 OVC022 25/22 ...
7731238,2016-04-28 18:13:00,KRCA 281813Z AUTO 34007KT 10SM OVC010 03/03 A3...


In [6]:
df.shape

(7731240, 2)

In [7]:
df.describe()

Unnamed: 0,2015-03-25 21:15:00,KCXP 252115Z AUTO 08005KT 10SM CLR 17/M01 A3033 RMK AO2
count,7731240,7731240
unique,465271,7731199
top,2015-10-12 00:53:00,KEYW 261153Z 03003KT 10SM SCT024 SCT031 29/26 ...
freq,324,2


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7731240 entries, 0 to 7731239
Data columns (total 2 columns):
 #   Column                                                   Dtype 
---  ------                                                   ----- 
 0   2015-03-25 21:15:00                                      object
 1   KCXP 252115Z AUTO 08005KT 10SM CLR 17/M01 A3033 RMK AO2  object
dtypes: object(2)
memory usage: 118.0+ MB


In [9]:
col_names = ['date', 'id', 'obsv_time', 'automated', 'wind_speed', 'visibility', 'cloud_cover', \
            'temp_dewpt', 'altimeter', 'altimeter_remarks', 'ao2']

In [10]:
df2 = pd.read_csv(metar, delimiter=' ', header=None, names =col_names)

A link to a PDF for breakdown of METAR code:
https://business.desu.edu/sites/business/files/document/16/metar_and_taf_codes.pdf

In [11]:
df2.head()

Unnamed: 0,date,id,obsv_time,automated,wind_speed,visibility,cloud_cover,temp_dewpt,altimeter,altimeter_remarks,ao2
0,2015-03-25,21:15:00\tKCXP,252115Z,AUTO,08005KT,10SM,CLR,17/M01,A3033,RMK,AO2
1,2015-03-25,21:13:00\tKWRB,252113Z,AUTO,06005KT,10SM,BKN014,OVC024,20/17,A3009,RMK
2,2015-03-25,21:12:00\tKTUL,252112Z,14006KT,10SM,TS,BKN035,BKN120,BKN250,24/16,A2975
3,2015-03-25,21:11:00\tKDRT,252111Z,AUTO,13007KT,10SM,SCT023,22/17,A2986,RMK,AO2
4,2015-03-25,21:10:00\tKBRL,252110Z,AUTO,33009KT,8SM,SCT016,07/03,A3006,RMK,AO2


In [12]:
df2['automated'].value_counts()

AUTO           4361474
00000KT         420787
VRB03KT          40599
COR              34505
VRB04KT          32273
                ...   
25034G46KT           1
28015G20             1
130015G20KT          1
18014G16KT           1
06007G22KT           1
Name: automated, Length: 17944, dtype: int64

In [13]:
df2['time'], df2['id'] = df2['id'].str.split('\t', 1).str

  """Entry point for launching an IPython kernel.


In [14]:
df2.head()

Unnamed: 0,date,id,obsv_time,automated,wind_speed,visibility,cloud_cover,temp_dewpt,altimeter,altimeter_remarks,ao2,time
0,2015-03-25,KCXP,252115Z,AUTO,08005KT,10SM,CLR,17/M01,A3033,RMK,AO2,21:15:00
1,2015-03-25,KWRB,252113Z,AUTO,06005KT,10SM,BKN014,OVC024,20/17,A3009,RMK,21:13:00
2,2015-03-25,KTUL,252112Z,14006KT,10SM,TS,BKN035,BKN120,BKN250,24/16,A2975,21:12:00
3,2015-03-25,KDRT,252111Z,AUTO,13007KT,10SM,SCT023,22/17,A2986,RMK,AO2,21:11:00
4,2015-03-25,KBRL,252110Z,AUTO,33009KT,8SM,SCT016,07/03,A3006,RMK,AO2,21:10:00


Need to move some of the automated values over.

In [16]:
auto = 0
not_auto = 0

for i in df2['automated']:
    if i == 'AUTO':
        auto += 1
    elif i != 'AUTO':
        not_auto += 1
    else:
        continue

print(auto)
print(not_auto)

4361474
3369767


In [17]:
print(auto / len(df2))
print(not_auto / len(df2))

0.564136339819183
0.435863660180817


Roughly half our values are not auto. Can either throw them out or shift them.

Figure out if I should shift them or not.

---

In [18]:
df = df2.copy()

Should I create two separate dataframes? Or not worry about the Auto col?

Try rule based approaches:
Try to parse KT at the end and then you know you're looking at Wind Speed
Same with Automated



In [19]:
df.head()

Unnamed: 0,date,id,obsv_time,automated,wind_speed,visibility,cloud_cover,temp_dewpt,altimeter,altimeter_remarks,ao2,time
0,2015-03-25,KCXP,252115Z,AUTO,08005KT,10SM,CLR,17/M01,A3033,RMK,AO2,21:15:00
1,2015-03-25,KWRB,252113Z,AUTO,06005KT,10SM,BKN014,OVC024,20/17,A3009,RMK,21:13:00
2,2015-03-25,KTUL,252112Z,14006KT,10SM,TS,BKN035,BKN120,BKN250,24/16,A2975,21:12:00
3,2015-03-25,KDRT,252111Z,AUTO,13007KT,10SM,SCT023,22/17,A2986,RMK,AO2,21:11:00
4,2015-03-25,KBRL,252110Z,AUTO,33009KT,8SM,SCT016,07/03,A3006,RMK,AO2,21:10:00
