# GeoTime Classify Examples

All date formats are in the [Unicode Locale Data Markup Language (LDML)](https://unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table) format.

In [1]:
import pandas as pd
from IPython.utils import io
from geotime_classify import geotime_classify as gc

In [2]:
GeoTimeClass = gc.GeoTimeClassify(10)

# Example Dataset 1:

This dataset contains latitude, longitude and a date.

All fields are correctly classified, including zero-padded month and day in the `date` field.

In [5]:
pd.read_csv('example_1.csv').tail()

Unnamed: 0,latitude,longitude,date,value
95,-76.733088,-174.133691,1993-03-14,41.356871
96,-40.785599,-91.011154,2020-06-02,28.407152
97,-72.360866,179.958728,2000-09-13,3.965644
98,-21.09476,52.549315,1986-07-13,33.240188
99,-9.961648,104.370319,1971-01-09,47.191888


In [16]:
c_classified=GeoTimeClass.columns_classified('example_1.csv')

Start LSTM predictions ...
Start geo validation ...
Start geo validation ...
Start geo validation ...


In [17]:
for c in c_classified:
    print(f"{c['column']}: {c['classification']}")

latitude: [{'Category': 'Geo', 'type': 'Latitude (number)'}]
longitude: [{'Category': 'Geo', 'type': 'Longitude (number)'}]
date: [{'Category': 'Date', 'Format': 'y-MM-dd', 'Parser': 'Util', 'DayFirst': False}]
value: [{'Category': 'Geo', 'type': 'Latitude (number)'}]


## Format date columns
Beyond the predictions we can add standarized date columns by using the add_iso8601_columns function. This doesn't work in every situation but it can be useful.The new column will be named ISO_8601_ plus the index of the column being transformed. In this case the date column is in index 2 for this dataframe. 

In [20]:

with io.capture_output() as captured:
    c_classified=GeoTimeClass.add_iso8601_columns('example_1.csv', formats='%s')

In [21]:
c_classified.head()

Unnamed: 0,latitude,longitude,date,value,ISO_8601_2
0,-65.202852,-51.321629,1985-01-25,22.439431,475477200
1,71.084338,80.259661,1996-08-14,31.091719,839995200
2,-33.661254,3.735232,1987-01-21,47.118235,538203600
3,-88.514946,-165.692973,2012-04-28,16.431557,1335585600
4,-81.020485,7.985187,1995-09-14,52.348503,811051200


# Example Dataset 2:

In this dataset, `latitude` and `longitude` are labelled `y` and `x` respectively, adding a challenge.

Additionally, this dataset has `city` and `country` (which is a country code). The `ts` column is a long date.

All columns are correctly classified. Note that the date format accounts for the missing zero-padding on the day. 


In [22]:
pd.read_csv('example_2.csv').tail()

Unnamed: 0,y,x,city,country,ts,value
95,-34.88422,150.60036,Nowra,AU,"January 24, 2001",6505
96,51.44889,5.51978,Tongelre,NL,"November 28, 1999",3884
97,45.35,126.28333,Shuangcheng,CN,"November 14, 1999",7885
98,44.31771,9.32241,Chiavari,IT,"December 1, 1979",6175
99,46.09273,-88.64235,Iron River,US,"November 21, 1973",7970


In [23]:
with io.capture_output() as captured:
    c_classified=GeoTimeClass.columns_classified('example_2.csv')

In [24]:
for c in c_classified:
    print(f"{c['column']}: {c['classification']}")

y: [{'Category': 'Geo', 'type': 'Latitude (number)'}]
x: [{'Category': 'Geo', 'type': 'Longitude (number)'}]
city: [{'Category': 'City Name'}]
country: [{'Category': 'ISO2'}]
ts: [{'Category': 'Date', 'Format': 'LLLL d, y', 'Parser': 'Util'}]
value: [{'Category': 'None'}]


# Example Dataset 3:


This dataset has a date related challenge: date is now split across 3 columns (`month`, `day`, `year`). These columns are correctly identified.

Additionally a `region` column is introduced and is correctly categorized as `Continent`.

In [25]:
pd.read_csv('example_3.csv').head()

Unnamed: 0,lat,lng,region,place_name,month,day,year,value
0,17.30858,97.01124,Asia,Yangon,1,22,1997,3660
1,40.60538,-73.75513,America,New York,11,5,1983,9859
2,33.41012,-91.06177,America,Chicago,8,21,1982,1906
3,38.58894,-89.99038,America,Chicago,5,9,1991,2960
4,51.67822,33.9162,Europe,Kiev,1,20,2010,9377


In [26]:
preds=GeoTimeClass.columns_classified('example_3.csv')

Start LSTM predictions ...
Start geo validation ...
Start geo validation ...
Start continent validation ...
Start country validation ...
Start state validation ...
Start cities validation ...
Start city validation ...
Start year validation ...


In [27]:
preds

[{'column': 'lat',
  'classification': [{'Category': 'Geo', 'type': 'Latitude (number)'}],
  'fuzzyColumn': 'Lat'},
 {'column': 'lng',
  'classification': [{'Category': 'Geo', 'type': 'Longitude (number)'}],
  'fuzzyColumn': 'lng'},
 {'column': 'region',
  'classification': [{'Category': 'Continent'}],
  'fuzzyColumn': 'Region'},
 {'column': 'place_name', 'classification': [{'Category': 'City Name'}]},
 {'column': 'month',
  'classification': [{'Category': 'Month Number'}],
  'fuzzyColumn': 'Month'},
 {'column': 'day', 'classification': [{'Category': 'Day Number'}]},
 {'column': 'year',
  'classification': [{'Category': 'Year'}],
  'fuzzyColumn': 'Year'},
 {'column': 'value', 'classification': [{'Category': 'None'}]}]

In [None]:
with io.capture_output() as captured:
    c_classified=GeoTimeClass.columns_classified('example_3.csv')

In [13]:
for c in c_classified:
    print(f"{c['column']}: {c['classification']}")