# Concatenate dataframes


In this exercise, you’ll practice concatenating records by creating a dataset of the 100 highest-rated cafes in New York City according to Yelp.

APIs often limit the amount of data returned, since sending large datasets can be time- and resource-intensive. The Yelp Business Search API limits the results returned in a call to 50 records. However, the offset parameter lets a user retrieve results starting after a specified number. By modifying the offset, we can get results 1-50 in one call and 51-100 in another. Then, we can append the dataframes.

pandas (as pd), requests, and json_normalize() have been imported. The 50 top-rated cafes are already in a dataframe, top_50_cafes.

Instructions

Add an "offset" parameter to params so that the Yelp API call will get cafes 51-100.
Concatenate the results of the API call to top_50_cafes, setting ignore_index so rows will be renumbered.
Print the shape of the resulting dataframe, cafes, to confirm there are 100 records.

In [1]:
# Add an offset parameter to get cafes 51-100
params = {"term": "cafe", 
          "location": "NYC",
          "sort_by": "rating", 
          "limit": 50,
          "offset" : 50}

result = requests.get(api_url, headers=headers, params=params)
next_50_cafes = json_normalize(result.json()["businesses"])

# Append the results, setting ignore_index to renumber rows
cafes = top_50_cafes.append(next_50_cafes, ignore_index = True)

# Print shape of cafes
print(cafes.shape)

# Merge dataframes


In the last exercise, you built a dataset of the top 100 cafes in New York City according to Yelp. Now, you'll combine that with demographic data to investigate which neighborhood has the most good cafes per capita.

To do this, you'll merge two datasets with the DataFrame merge() method. The first,crosswalk, is a crosswalk between ZIP codes and Public Use Micro Data Sample Areas (PUMAs), which are aggregates of census tracts and correspond roughly to NYC neighborhoods. Then, you'll merge in pop_data, which contains 2016 population estimates for each PUMA.

pandas (as pd) has been imported, as has the cafes dataframe from last exercise.

Question
Explore the cafes and crosswalk dataframes in the console. Which columns should be used as join keys?

answers

location_zip_code in cafes and zipcode in crosswalk

Question
Explore the crosswalk and pop_data dataframes in the console. Which columns should be used as join keys?

answers

puma in both

3/3

Use the DataFrame method to merge cafes and crosswalk on location_zip_code and zipcode, respectively. Assign the result to cafes_with_pumas.
Merge pop_data into cafes_with_pumas on their puma fields. Save the result as cafes_with_pop.


In [2]:
# Merge crosswalk into cafes on their zip code fields
cafes_with_pumas = cafes.merge(crosswalk, left_on="location_zip_code", right_on="zipcode")



# Merge pop_data into cafes_with_pumas on puma field
cafes_with_pop = cafes_with_pumas.merge(pop_data, left_on="puma", right_on="puma")

# View the data
print(cafes_with_pop.head())