We extract the osm data for Chicago from www.openstreetmap.org. We represent any top level node or way as a python dictionary, whose keys come from the keys of attributes of an top-level element(way or node) and its children. Firstly, the node and way elements gives the keys "version", "changeset", "timestamp", "user", "uid". We can put these closely related keys into the group named 'created'. These are related to the user creating the element, the time the element is created and how many times the element is modified. Secondly, we have some adjustments on address information provided by each top level element. We have the following keys:

'add:city', 'addr', 'addr:STE', 'addr:city','addr:country', 'addr:full', 'addr:housename', 'addr:place','addr:housenumber','addr:interpolation', 'addr:postcode', 'addr:province', 'addr:state', 'addr:street', 'addr:street:name', 'addr:street:prefix', 'addr:street:suffix', 'addr:street:type','addr:suite', 'addr:unit'

We combined every information corresponding to the above keys in a dictionary named 'address' with keys city, STE, country, street.... For example, we transform the following two elements:

```
<node id= '1031824591', version= '3', timestamp= '2010-12-11T20:56:27Z', uid= '70696', user= 'xybot', changeset= '6629487', lat= '41.9890455', lon= '-87.7928965'>
    <tag k= 'addr:housenumber', v= '5943'/>
    <tag k= 'addr:postcode', v= '60631'/>
    <tag k= 'addr:street', v= 'N. Northwest Hwy'/>
    <tag k= 'amenity', v= 'pub'/>
    <tag k= 'name', v= 'Trinity Pub'/>
    <tag k= 'source', v= 'Jukebox'/>
    <tag k= 'website', v= 'http://www.trinitypubchicago.com' />
    <tag k= 'wifi', v= 'free' />
</node>
```
and

```
<way id= '231098247', version= '2', timestamp= '2013-11-29T06:40:34Z', uid= '567034', user= 'Umbugbene', changeset='19172664'>
    <nd ref= '2395056316'/>
    <nd ref= '2395056299'/>
    <nd ref= '2395056310'/>
    <nd ref= '2395056285'/>
    <nd ref= '2395056316'/>
    <tag k= 'addr:housenumber', v= '2502-2524'/>
    <tag k= 'addr:postcode', v= '60616'/>
    <tag k= 'addr:street', v= 'South Dr. Martin Luther King, Jr. Drive'/>
    <tag k= 'building', v= 'terrace' />
</node>

```

into the below python dictionaries:

```
{
"id": '1031824591',
"type": "node",
"created": {
           "version":"3",
           "changeset":'6629487',
           "timestamp":'2010-12-11T20:56:27Z',
           "user":'xybot',
           "uid":'70696'
         },
"pos": [41.9890455, -87.7928965],
"address": {
           "housenumber": '5943',
           "postcode": "60631",
           "street": 'N. Northwest Hwy'
         },
"amenity": 'pub',
"name": 'Trinity Pub',
"source': 'Jukebox',
"website': 'http://www.trinitypubchicago.com',
"wifi': 'free'
}    

```
and

```
{
"id": '231098247',
"type": "way",
"created": {
           "version":"2",
           "changeset":'19172664',
           "timestamp":'2013-11-29T06:40:34Z',
           "user":'Umbugbene',
           "uid": '567034'
         },
"node_refs": ['2395056316', '2395056299', '2395056310', '2395056285', '2395056316'],
"address": {
           "housenumber": '2502-2524',
           "postcode": "60616",
           "street": 'South Dr. Martin Luther King, Jr. Drive"
         },
"building": 'terrace',
},

```
repectively.


Intrinsically, a way consists of nodes. We list all the nodes a way contained by node_refs key. We now examine address keys much more closely. We observe that the value of 'addr' key is actually street name, so replace it by 'addr:street'. Of course, 'add:city' must be 'addr:city'. It is easy to see that, apart from a few exceptions, 'addr:full' is an exact combination of 'addr:housenumber' and 'addr:street'. After dealing with the exceptions, we delete this key because it's unnecessary. Now we are come to our main task: to clean the street names. There are some inconsistencies and also some street names without street types or with almost full address. We replace abbreviations by their original forms. Also observe that 'addr:street' is a combination of 'addr:street:name', 'addr:street:prefix', 'addr:street:type' and 'addr:street:suffix', so we ignore the last three. Consider two examples:

{'k': 'addr:street', 'v': '1050 Essington Rd. Joliet, IL 60435'}

{'k': 'addr:street', 'v': 'North Sherman Avenue #104'}

We should put the all information above into correct places. For example, Joliet to 'addr:city', IL to 'addr:state', 104 to 'addr:suite'. addr:street cannot contain the characters such as '#',':','-'.

Some Abbreviations:

Street (St.), "Avenue(Ave.)", "Boulevard(Blvd.)", "Drive(Dr.)", "Court(Ct.)", "Place(Pl.)", "Square(Sq.)", "Lane(Ln.)", "Road(Rd.)", "Trail(Trl.)", "Parkway(Pkwy.)", "Commons(Cmn.)", "Route (Rte.)", "Highway(Hwy.)", "Access(Accs.)", "Walk(Wlk.)", "Way(Wy.)", "Market(Mkt.)", "Circle(Cir.)", "Row(Row)", "Center(Ctr.)", "Terrace(Terr.)", "Park(Pk.)", "East(E.)", "North(N.)", "South(S.)", "West(W.)", "Extension(Ex.)"

Some examples of changes:

1. Belmont ------ Belmont Avenue
2. N St. Clair St ------ North Saint Clair Street
3. E Park Ave ------ Earst Park Avenue
4. N Glenwood ------ North Glenwood Avenue
5. Towne Ct--------Towne Court
6. North Edens PKWY------North Edens Parkway
7. E Main St------East Main Street
8. East Van Buren-------East Van Buren Avenue
9. Margaret Pl ---------- Margaret Place
10. HWY 59 ---------- Highway 59
11. N Kedzie ---------- North Kedzie Avenue
12. US 34 ---------- Us Route 34
13. North Humboldt SD ---------- North Humboldt Square
14. East Carver PLZ ---------- East Carver Plaza
15. Route 38 ---------- US Route 38
16. US-6 ---------- US Route 6
17. N Damen ---------- North Damen Avenue
18. Shabbona ---------- Shabbona Street
19. 1050 Essington Rd. Joliet, IL 60435 ---------- Essington Road
