# Beginning of Project - Assembling Different Data Sources

In [25]:
# imports
import pandas as pd 
import geopandas as gpd
from shapely import wkt
import numpy as np


# import the dataset from the state
stores_ny = pd.read_csv('https://raw.githubusercontent.com/mycow2/expensive_groceries/refs/heads/main/Retail_Food_Stores_20241226.csv')
stores_ny['geometry'] = stores_ny['Georeference'].apply(wkt.loads)
geo_stores_ny = gpd.GeoDataFrame(stores_ny,crs = 'epsg:4326')
# import the dataset from github
prices_nyc = pd.read_csv('https://raw.githubusercontent.com/nychealth/food-pricing-survey-nyc-2019/refs/heads/main/Cleaned_Pricing_data_imputed_final.csv',index_col = 0)

In [26]:
#limiting to just stores in nyc
nyc_counties = ['NEW YORK','KINGS','QUEENS','RICHMOND','BRONX']
geo_stores_nyc = geo_stores_ny[geo_stores_ny['County'].isin(nyc_counties)]

#flagging for offenders
offenders = ['DAGOSTINOS MARKETS LLC','NAMDOR INC']
geo_stores_nyc['offender'] = np.where((geo_stores_nyc['Entity Name'].isin(offenders)) | (geo_stores_nyc['DBA Name'].str.contains('MORTON')),1,0)

#drop those that aren't actually in NYC
#756931,  712975   
geo_stores_nyc = geo_stores_nyc[~geo_stores_nyc['License Number'].isin([756931,712975,753473])]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


In [27]:
# Others (of type AC / ABC and in Manhattan, green)
manhattan = geo_stores_nyc[(geo_stores_nyc['County']=='NEW YORK') & (geo_stores_nyc['Establishment Type'].isin(['AC','ABC','A']))]
manhattan.explore(tiles = 'CartoDB Positron',column = 'offender',cmap = 'Set1_r',categorical = True,tooltip = ['Entity Name','DBA Name',
                                                                                                             'Establishment Type','License Number','Zip Code','Square Footage'])

In [None]:
prices_nyc

In [28]:
#most common owners
owners = manhattan['Entity Name'].value_counts()
owners

CVS ALBANY LLC                      55
DUANE READE ETAL PTRS               37
DUANE READE ET AL PTRS              27
JUICE PRESS LLC THE                 24
NAMDOR INC                          18
                                    ..
BLUE SKY CHOPPED CHEESE DEL CORP     1
HARLEM FOOD SQUARE CORP              1
FELIX GOURMET INC                    1
YORK AMBE DELI GROCERY LLC           1
PUNJABI GROCERY&DELI INC             1
Name: Entity Name, Length: 1567, dtype: int64

What you can see here is that Namdors (Gristedes, Foodtown), Gristedes and Morton Williams operate all around Manhattan - they have particularly strong presence in Kips Bay, the UES, and particularly the entire band between Houston and 34th. They are notably absent from Upper Manhattan, the Lower East Side, and Downtown Manhattan in general. Overall, I was surprised by how widespread they are in the city.

This (as well as my own experience / reddit threads) have led me to hypothesize a few reasons for their continued presence: 
* Lack of competition
* Consumer insensitivity to price / quality
* Network effect / Consumer lock in 
* Real estate reasons - they are more interested in holding the real estate so that it eventually appreciates
* Or is this just a matter of time? 