## Mapping the World's Endangered Languages
### CSC630: Rudd Fawcett


Below, please find specifications about the development of my map project. The website [is live here.](csc630.rudd.io/projects/endangered-languages/).

Looked at the production of two maps — one being a globe and the other a single continental map of Africa, this project focused on the mapping of endangered languages from around the world. The goals of this project were two fold:

1. To develop a beautiful experience and website that accurately and interactively mapped endangered languages.
2. To develop a firm grasp and understanding of mapping and projections thorugh the use of d3.js

As someone who is passionate about the usage and preservation of languages around the world, I was eager to start working on this project.

Overall, this project would not have been possible without a great dataset from *The Guardian*, that was [distributed for free use on Kaggle](https://www.kaggle.com/the-guardian/extinct-languages). The dataset, which includes over two thousand endangered languages, includes endangerment status, a rough estimate for number of speakers, coordinates for its epicenter, and even more metadata.

Throughout this particular project, I also developed new templates on which the rest of my final data vizualization projects are built.

### Preparing Data for Use

The data was pretty much used as is from *The Guardian*, but I did have to clean it in order to remove some mislabeled languages that were, for example, in the middle of the ocean. I did so using the following Python script:

In [None]:
import json
import csv
from mpl_toolkits.basemap import Basemap
import pandas as pd
import shapely.geometry
bm = Basemap()

original = open('raw/endangered-languages.csv')

def clean_coordinates():
    on_land = []
    off_land = []

    for line in csv.DictReader(original):
        if (line['Longitude'] and line['Latitude']):
            x = float(line['Longitude'])
            y = float(line['Latitude'])

            if (bm.is_land(x, y)):
                on_land.append(line)
            else:
                off_land.append(line)

    land = pd.DataFrame(on_land)
    off = pd.DataFrame(off_land)

    land.to_csv('clean/endangered-languages-on-land.csv', index=False)
    off.to_csv('clean/endangered-languages-off-land.csv', index=False)

This script uses the Basemap open source library, and loops through every point in *The Guardian* CSV file in order to check quite simply if it's `on_land` or `off_land`.

### Additional Challenges
Another goal of this project was to push the boundaries of the various map projections that I had come to know and use in previous projects for the course. Ultimately, I ended up using the `d3.geoOrthographic()` and `d3.geoChamberlinAfrica()` projections for my two maps. I chose to map Africa as a standalone content purely due to the fact that it had it's own projection available. This brought up a challenge with how to best display data, however. Due to the large number of points, it didn't make sense to continue to plot points not on Africa, and that proved to be true for the GeoJSON and TopoJSON data as well. Therefore, I came up with the following solution:

1. Use Basemap (again) and a custom Africa GeoJSON map file.
2. Load the same map file in both Python on Basemap, and also using d3.js.
3. Loop through the languages in Python and discard any not within the bounds of continental Africa.
4. Save to a new CSV called `endangered-languages-africa.csv`.

In [None]:
africa_geojson = json.load(open('africa.geojson'))

def africa_coordinates():
    in_africa = []

    for line in csv.DictReader(original):
        if (line['Longitude'] and line['Latitude']):
            x = float(line['Longitude'])
            y = float(line['Latitude'])

            for country in africa_geojson['features']:
                shape = shapely.geometry.asShape(country['geometry'])
                point = shapely.geometry.Point(x, y)
                if shape.contains(point):
                    in_africa.append(line)

    africa = pd.DataFrame(in_africa)
    africa.to_csv('clean/endangered-languages-africa.csv', index=False)

## Final Thoughts
This proved to be a great project for me to work on. It pushed me both in my aesthetic design ability, and also my ability to problem solve (map projection's can be HARD). Overall, I am glad that I had the opportunity to work on such a project. Make sure to check it out here: http://csc630.rudd.io/projects/endangered-languages/.

## Citations and Attributions

- <em>Extinct Languages</em>. V1. December 7, 2016. Distributed by The Guardian via Kaggle. <a href='https://www.kaggle.com/the-guardian/extinct-languages'>https://www.kaggle.com/the-guardian/extinct-languages</a>.

- World Atlas TopoJSON 110m. April 1, 2017. Distributed by Mike Bostock. http://bl.ocks.org/mbostock/raw/4090846/world-110m.json.

- World Country Names TopoJSON. Date unkown. Distributed by Mike Bostock. http://bl.ocks.org/mbostock/raw/4090846/world-country-names.tsv.

- Africa GeoJSON. https://github.com/codeforamerica/click_that_hood/blob/master/public/data/africa.geojson

