# GeoTime Classify Examples

All date formats are in the [Unicode Locale Data Markup Language (LDML)](https://unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table) format.

In [1]:
import pandas as pd
from IPython.utils import io
from geotime_classify import geotime_classify as gc



In [2]:
GeoTimeClass = gc.GeoTimeClassify(100)

# Example Dataset 1:

This dataset contains latitude, longitude and a date.

All fields are correctly classified, including zero-padded month and day in the `date` field.

In [3]:
pd.read_csv('example_1.csv').head()

Unnamed: 0,latitude,longitude,date,value
0,-65.202852,-51.321629,1985-01-25,22.439431
1,71.084338,80.259661,1996-08-14,31.091719
2,-33.661254,3.735232,1987-01-21,47.118235
3,-88.514946,-165.692973,2012-04-28,16.431557
4,-81.020485,7.985187,1995-09-14,52.348503


In [4]:
with io.capture_output() as captured:
    c_classified=GeoTimeClass.columns_classified('example_1.csv')

In [5]:
for c in c_classified:
    print(f"{c['column']}: {c['classification']}")

latitude: [{'Category': 'Geo', 'type': 'Latitude (number)'}]
longitude: [{'Category': 'Geo', 'type': 'Longitude (number)'}]
date: [{'Category': 'Date', 'Format': 'y-MM-dd', 'Parser': 'Util', 'DayFirst': False}]
value: [{'Category': 'Geo', 'type': 'Latitude (number)'}]


# Example Dataset 2:

In this dataset, `latitude` and `longitude` are labelled `y` and `x` respectively, adding a challenge.

Additionally, this dataset has `city` and `country` (which is a country code). The `ts` column is a long date.

All columns are correctly classified with the exception of `ts`. `ts` is correctly categorized as a `Date`, however the detected format is:

```
LLLL dd, y
```

It is actually:

```
LLLL d, y
```

since there is no zero-padding on the day.

In [6]:
pd.read_csv('example_2.csv').head()

Unnamed: 0,y,x,city,country,ts,value
0,38.96667,-0.18333,Gandia,ES,"February 17, 1983",9624
1,17.06542,-96.72365,Oaxaca,MX,"April 7, 1996",8973
2,59.33333,18.28333,Boo,SE,"December 5, 2012",9348
3,36.20829,-115.98391,Pahrump,US,"October 27, 2011",7594
4,54.03876,43.91385,Kovylkino,RU,"July 20, 1998",96


In [7]:
with io.capture_output() as captured:
    c_classified=GeoTimeClass.columns_classified('example_2.csv')

In [8]:
for c in c_classified:
    print(f"{c['column']}: {c['classification']}")

y: [{'Category': 'Geo', 'type': 'Latitude (number)'}]
x: [{'Category': 'Geo', 'type': 'Longitude (number)'}]
city: [{'Category': 'City Name'}]
country: [{'Category': 'ISO2'}]
ts: [{'Category': 'Date', 'Format': 'LLLL dd, y', 'Parser': 'Util'}]
value: [{'Category': 'None'}]


# Example Dataset 3:


This dataset has a date related challenge: date is now split across 3 columns (`month`, `day`, `year`). These columns are correctly identified.

Additionally a `region` column is introduced and is correctly categorized as `Continent`.

In [9]:
pd.read_csv('example_3.csv').head()

Unnamed: 0,lat,lng,region,place_name,month,day,year,value
0,17.30858,97.01124,Asia,Yangon,1,22,1997,3660
1,40.60538,-73.75513,America,New York,11,5,1983,9859
2,33.41012,-91.06177,America,Chicago,8,21,1982,1906
3,38.58894,-89.99038,America,Chicago,5,9,1991,2960
4,51.67822,33.9162,Europe,Kiev,1,20,2010,9377


In [10]:
with io.capture_output() as captured:
    c_classified=GeoTimeClass.columns_classified('example_3.csv')

In [11]:
for c in c_classified:
    print(f"{c['column']}: {c['classification']}")

lat: [{'Category': 'Geo', 'type': 'Latitude (number)'}]
lng: [{'Category': 'Geo', 'type': 'Longitude (number)'}]
region: [{'Category': 'Continent'}]
place_name: [{'Category': 'City Name'}]
month: [{'Category': 'Month Number'}]
day: [{'Category': 'Day Number'}]
year: [{'Category': 'Year'}]
value: [{'Category': 'None'}]
