The following latitude and longitude formats are supported by the `output_format` parameter:

* Decimal degrees (dd): 41.5
* Decimal degrees hemisphere (ddh): "41.5° N"
* Degrees minutes (dm): "41° 30′ N"
* Degrees minutes seconds (dms): "41° 30′ 0″ N"

You can split a column of geographic coordinates into one column for latitude and another for longitude by setting the parameter ``split`` to True.

Invalid parsing is handled with the `errors` parameter:

* "coerce" (default): invalid parsing will be set to NaN
* "ignore": invalid parsing will return the input
* "raise": invalid parsing will raise an exception

After cleaning, a **report** is printed that provides the following information:

* How many values were cleaned (the value must have been transformed).
* How many values could not be parsed.
* A summary of the cleaned data: how many values are in the correct format, and how many values are NaN.
  
The following sections demonstrate the functionality of `clean_lat_long()` and `validate_lat_long()`. 

### An example dataset with geographic coordinates

In [None]:
import pandas as pd
import numpy as np
df = pd.DataFrame({
    "lat_long":
    [(41.5, -81.0), "41.5;-81.0", "41.5,-81.0", "41.5 -81.0",
     "41.5° N, 81.0° W", "41.5 S;81.0 E", "-41.5 S;81.0 E",
     "23 26m 22s N 23 27m 30s E", "23 26' 22\" N 23 27' 30\" E",
     "UT: N 39°20' 0'' / W 74°35' 0''", "hello", np.nan, "NULL"]
})
df

## 1. Default `clean_lat_long()`

By default, the `output_format` parameter is set to "dd" (decimal degrees) and the `errors` parameter is set to "coerce" (set to NaN when parsing is invalid).

In [None]:
from dataprep.clean import clean_lat_long
clean_lat_long(df, "lat_long")

Note (41.5, -81.0) is considered not cleaned in the report since it's resulting format is the same as the input. Also, "-41.5 S;81.0 E" is invalid because if a coordinate has a hemisphere it cannot contain a negative decimal value.

## 2. Output formats

This section demonstrates the supported latitudinal and longitudinal formats.

### decimal degrees hemisphere (ddh)

In [None]:
clean_lat_long(df, "lat_long", output_format="ddh")

### degrees minutes (dm)

In [None]:
clean_lat_long(df, "lat_long", output_format="dm")

### degrees minutes seconds (dms)

In [None]:
clean_lat_long(df, "lat_long", output_format="dms")

## 3. `split` parameter

The split parameter adds individual columns containing the cleaned latitude and longitude values to the given DataFrame.

In [None]:
clean_lat_long(df, "lat_long", split=True)

Split can be used along with different output formats.

In [None]:
clean_lat_long(df, "lat_long", split=True, output_format="dm")

## 4. `inplace` parameter
This just deletes the given column from the returned dataframe. 
A new column containing cleaned coordinates is added with a title in the format `"{original title}_clean"`.

In [None]:
clean_lat_long(df, "lat_long", inplace=True)

### `inplace` and `split`

In [None]:
clean_lat_long(df, "lat_long", split=True, inplace=True)

## 5. Latitude and longitude coordinates in separate columns

### Clean latitude or longitude coordinates individually

In [None]:
df = pd.DataFrame({"lat": [" 30′ 0″ E", "41° 30′ N", "41 S", "80", "hello", "NA"]})
clean_lat_long(df, lat_col="lat")

### Combine and clean separate columns

Latitude and longitude values are counted separately in the report.

In [None]:
df = pd.DataFrame({"lat": ["30° E", "41° 30′ N", "41 S", "80", "hello", "NA"],
                      "long": ["30° E", "41° 30′ N", "41 W", "80", "hello", "NA"]})
clean_lat_long(df, lat_col="lat", long_col="long")

### Clean separate columns and split the output

In [None]:
clean_lat_long(df, lat_col="lat", long_col="long", split=True)

## 6. `validate_lat_long()` 

`validate_lat_long()` returns True when the input is a valid latitude or longitude value otherwise it returns False.
Valid types are the same as `clean_lat_long()`. 

In [None]:
from dataprep.clean import validate_lat_long
print(validate_lat_long("41° 30′ 0″ N"))
print(validate_lat_long("41.5 S;81.0 E"))
print(validate_lat_long("-41.5 S;81.0 E"))
print(validate_lat_long((41.5, 81)))
print(validate_lat_long(41.5, lat_long=False, lat=True))

In [None]:
df = pd.DataFrame({"lat_long": 
                   [(41.5, -81.0), "41.5;-81.0", "41.5,-81.0", "41.5 -81.0", 
                    "41.5° N, 81.0° W", "-41.5 S;81.0 E", 
                    "23 26m 22s N 23 27m 30s E", "23 26' 22\" N 23 27' 30\" E", 
                    "UT: N 39°20' 0'' / W 74°35' 0''", "hello", np.nan, "NULL"]
                  })
validate_lat_long(df["lat_long"])

### Validate only one coordinate

In [None]:
df = pd.DataFrame({"lat": 
                   [41.5, "41.5", "41.5  ", 
                    "41.5° N", "-41.5 S", 
                    "23 26m 22s N", "23 26' 22\" N", 
                    "UT: N 39°20' 0''", "hello", np.nan, "NULL"]
                  })
validate_lat_long(df["lat"], lat_long=False, lat=True)