### This Notebook will be used for the Applied Datascience Capstone Project. 

### The Capstone project is the culminating activity for the IBM Datascience Professional Certificate. The Certificate covers a broad range of topics ranging from data science methodology, data analysis with python to data visualization and machine learning.


### Table of Contents
1. [Introduction](#introduction)
2. [Literature Review](#literature-review)
3. [Data](#data)
4. [Methodology](#methodology)
5. [Results](#results)
6. [Discussion](#discussion)
7. [Conclusion](#conclusion)
8. [Bibliography](#bibliography)

### Introduction

Toronto is a city with a large immigrant population. However, migration is an expensive and risky endeavor for many migrants. Therefore, an informed approach to immigration is a wise thing to do. Two very broad strategies to the migration effort would be either: trend following or trend setting. 

In trend following, the prospective migrant tries to find out how others have migrated. The basics would have to be covered such as source of income which is typically through employment and place of residence. The presence of food sellers catering to particular food types within a neighborhood might be helpful in gaining an understanding of the residents within the area.

The other strategy to migration is probably more appealing to those with an entrepreneurial mindset. This approach involves trying to find out what are the potential growth areas. Trying to find out what are the products and services needed but are missing from a neighborhood can be a another strategy for the business minded migrant. Since it is enterpreneurial in approach, it is also more risky because sometimes missing products and services can indicate that there is no need or demand for the absent products or services.

No matter the strategy adopted, a deeper understanding of the target neighborhood for immigration would definitely be better than immigrating without a clue about the destination neighborhood. 

This project seeks to answer the question of what does it take to successfully immigrate into Canada, and Toronto in particular.

As a strategy, this project will profile the neighborhoods in terms of demographic information  compare this against employment and business licensing information. This project will attempt to see if there are useful success patterns that can be established. This profiling exercise will also attempt to find any possible areas of growth that may provide business opportunities.

### Literature Review

Canada is considered one of the more successful models in terms of immigration. 
Canada has a relatively large immigrant labour force both temporary and permanent. (Liebig 2016)
In Canada, migrants comprise about 20.6% of the entire population. In Toronto alone, immigrants account
for 33% of the entire population (Government of Canada, 2020). 

There are primarily four factors seen to contribute to successful migrant integration as suggested by a paper on immigrant integration. (Government of Canada, 2020)
These factors are: the migrant skills, stakeholder attitude, early information access and settlement location.
The challenges to attracting the right skills and talent aside, larger companies have better bandwidth and resources to
welcome migrants as contrasted to smaller enterprises. Information access and access to relevant work networks is dependent
on the immigrant support organizations in the area assigned to the migrants. According to the same paper, region of first settlement may
not necessarily be the place of eventual settlment and the right conditions have to be in place in order for migrants to feel
a connection and community. Another article suggests that perhaps some tax relief and residence acquisition assistance would help migrants in settling better (Hosseini, 2017)
Whichever the case may be, the new migrant labourer requires access to employment that would probably be provided by larger
employers due to employer capacity and network with the supporting organizations. The exception might be that of nursing care workers who use a specialized route for migration (Liston & Carens, 2008)
which would gravitate toward areas where larger populations of elderly reside.

On the flip side, survival enterpreneurs would need access to people of similar culture and ethnic background to support their business. (Chrysostome, 2010)
This means that success of survival enterpreneurs would gravitate to the proximity of the areas with large settlement of a particular ethnic group.

### Data

One advantage of using Toronto as a city of study is the availability of data collected in the open data initiative of the city. 

Using foursquare in combination with business registrations in the city, we are able to home in on businesses which can potentially provide employment for would be migrants. We would definitely detail out the nature of the businesses in the city to get a good picture for matching with the migrants. Census data can show the concentration of elderly in the city, another source of employment for aspiring migrants.

Census data would once again be helpful in seeking out the residence areas and concentration of particular ethnic groups which are extremely vital for the survival enterpreneurs.

The primary data tools of choice would be descriptive in nature. In terms of describing what is already on the ground, the communities and the possible employment opportunities. However, it might also be useful to run some clustering analysis to validate some assumptions provided in the literature review. Cluster analysis may also provide some serendipitous insights as well.

Aside from Foursquare and map data, the following datasets would be used:
- Toronto Open Data: https://https://open.toronto.ca
    - Toronto Demographics 2016: https://open.toronto.ca/dataset/wellbeing-toronto-demographics/
    - Neighbourhood Profiles 2016: https://open.toronto.ca/dataset/neighbourhood-profiles/
    - Municipal Licensing and Standards - Business Licences and Permits: https://open.toronto.ca/dataset/municipal-licensing-and-standards-business-licences-and-permits/
    - Toronto Employment Survey Summary Tables: https://open.toronto.ca/dataset/toronto-employment-survey-summary-tables/


### Methodology

The Jupyter notebook for this analysis can be found here: https://github.com/gtq/Coursera_Capstone/blob/master/Battle_of_Neighborhoods_Notebook.ipynb

The goals of this project is basically exploratory since there are so many factors required for successful immigration. Basic data exploration including descriptive statistics and correlation was done and some clustering was done to uncover any potential patterns in the data. The initial thinking in this study was to follow the money. Therefore, the primary indicator followed was income and business. 

Business registration data was explored and checked if there were useful key features to the data such as where businesses were located and what kind of services they offered. This was primarily based on municipal registration data but Foursquare data supplements this as well.

Employment data was also explored. The industries that employees worked in as well as the location of this employment was also checked. 

Demographic data was checked as well. Initially, the well-being demographics survey was used because it was better formatted for analysis. However, the data only had average income and so neighbourhood profile data was added to supplement the analysis. We took a look at average income versus average rent.

Since the literature review indicated that there is an importance given to ethno-cultural ties  for survival entrepreneurs, the relationship of the language groups to average family income was investigated. Decile family income against the Chinese population correlated as well. Apart from the possibility that the Chinese would have capital, they are the most numerous ethnic group and like most asians known to network with each other. 

Lastly, using Foursquare data as input and filtered by selected interest areas derived from previous analysis steps, a cluster analysis was conducted to check for possible left field insights on the data.





### Results

![business_reg_graph.png](attachment:business_reg_graph.png)

|       | Contractor/Repairman | Entertainment | Fireworks  | Food/Beverage | Other       | Services   | Temporary Sign | Transportation |
|-------|----------------------|---------------|------------|---------------|-------------|------------|----------------|----------------|
| count | 120.000000           | 120.000000    | 120.000000 | 120.000000    | 120.000000  | 120.000000 | 120.000000     | 120.000000     |
| mean  | 81.791667            | 12.966667     | 4.616667   | 484.983333    | 369.066667  | 101.925000 | 36.241667      | 60.233333      |
| std   | 70.188330            | 16.865724     | 9.082427   | 460.856783    | 293.166793  | 86.332479  | 43.433325      | 61.319762      |
| min   | 0.000000             | 0.000000      | 0.000000   | 0.000000      | 0.000000    | 0.000000   | 0.000000       | 0.000000       |
| 25%   | 11.500000            | 1.000000      | 0.000000   | 104.250000    | 128.750000  | 28.750000  | 6.000000       | 8.500000       |
| 50%   | 72.500000            | 8.000000      | 1.000000   | 403.000000    | 344.000000  | 89.000000  | 25.500000      | 45.500000      |
| 75%   | 140.250000           | 19.000000     | 5.250000   | 701.750000    | 557.250000  | 155.250000 | 46.250000      | 94.750000      |
| max   | 244.000000           | 94.000000     | 61.000000  | 2402.000000   | 1240.000000 | 406.000000 | 217.000000     | 276.000000     |

The business listings were grouped under the categories shown above. We see that a large portion of registrations goes into the food and beverage category. While there were peculiar characteristics found from the data perhaps useful to the survival enterpreneur, we cannot see the money in this picture. Perhaps we also notice that there are no banks.

![rent_vs_income.png](attachment:rent_vs_income.png)

In the well-being data, we see that rent averages out at a certain point. Some areas have average rent that is practically half of average income. However, there are areas where the average income far exceeds the cost of rent. This is promising for following the money trail. However, one pitfall of using the well-being data for this purpose is that it only contains average income. To offset this limitation, we use neighbourhood data which contains a wealth of information which includes ethnicity information and decile categorized income.

|       | Total Area | Total Population | Pop - Males | Pop - Females | Chinese     | South Asian | Black       | Filipino   | Latin American | Southeast Asian | Arab       | West Asian | Korean     | Japanese  | Other Visible Minority | Multiple Visible Minority | Not a Visible Minority | Language - Chinese | Language - Italian | Language - Spanish | Language - Tagalog | Language - Tamil | Language - Urdu | Non-Movers  | Movers      | Recent Immigrants | In Labour Force | Unemployed | Not in Labour Force | Less than grade 9 | With College Certificate/Diploma | With Bachelor Degree or Higher | Seniors Living Alone | Total Tenants | High Shelter Costs | Owned Dwellings | Rented Dwellings | Home Repairs Needed | Tenant Average Rent | Low Income Families | Low Income Singles | Low Income Children | Average Family Income | Pre-Tax Household Income | After-Tax Household Income |
|-------|------------|------------------|-------------|---------------|-------------|-------------|-------------|------------|----------------|-----------------|------------|------------|------------|-----------|------------------------|---------------------------|------------------------|--------------------|--------------------|--------------------|--------------------|------------------|-----------------|-------------|-------------|-------------------|-----------------|------------|---------------------|-------------------|----------------------------------|--------------------------------|----------------------|---------------|--------------------|-----------------|------------------|---------------------|---------------------|---------------------|--------------------|---------------------|-----------------------|--------------------------|----------------------------|
| count | 118        | 118              | 118         | 118           | 118         | 118         | 118         | 118        | 118            | 118             | 118        | 118        | 118        | 118       | 118                    | 118                       | 118                    | 118                | 118                | 118                | 118                | 118              | 118             | 118         | 118         | 118               | 118             | 118        | 118                 | 118               | 118                              | 118                            | 118                  | 118           | 118                | 118             | 118              | 118                 | 118                 | 118                 | 118                | 118                 | 118                   | 118                      | 118                        |
| mean  | 4.757542   | 18395.254237     | 8885.169492 | 9545.042373   | 2168.305085 | 2255.847458 | 1523.983051 | 746.101695 | 439.067797     | 262.330508      | 168.813559 | 313.940678 | 251.271186 | 86.355932 | 185.974576             | 230.338983                | 9534.067797            | 1515.29661         | 306.567797         | 294.364407         | 246.271186         | 398.516949       | 229.915254      | 9385.59322  | 7796.271186 | 1968.644068       | 9901.313559     | 748.389831 | 5276.313559         | 790.127119        | 1535.847458                      | 245.508475                     | 647.669492           | 3235.889831   | 2586.610169        | 3935.508475     | 3251.567797      | 549.745763          | 936.694915          | 4771.779661         | 1184.220339        | 343.915254          | 80608.008475          | 59020.847458             | 60830.364407               |
| std   | 4.867967   | 8527.693871      | 4237.216316 | 4548.853124   | 3541.420367 | 3296.417137 | 1673.081573 | 787.951237 | 524.780764     | 408.066465      | 212.509103 | 471.402187 | 477.512781 | 69.101873 | 279.978066             | 212.043102                | 4614.596914            | 2783.110638        | 556.795509         | 389.636883         | 296.61926          | 856.707399       | 474.922388      | 4250.897601 | 4570.217466 | 1750.867997       | 4709.50228      | 438.799782 | 2712.696047         | 468.601968        | 888.672376                       | 210.260715                     | 364.366727           | 2236.254465   | 1445.338348        | 2086.285231     | 2226.942511      | 284.476463          | 151.053594          | 2307.918969         | 811.841814         | 323.237141          | 49492.863157          | 22575.664478             | 24067.492426               |
| min   | 0.4        | 5450             | 2940        | 2975          | 50          | 65          | 10          | 25         | 0              | 0               | 0          | 0          | 0          | 0         | 0                      | 10                        | 1580                   | 25                 | 0                  | 0                  | 0                  | 0                | 0               | 2520        | 2395        | 0                 | 3600            | 155        | 1420                | 105               | 440                              | 30                             | 115                  | 130           | 300                | 300             | 135              | 95                  | 550                 | 1205                | 108                | 0                   | 34825                 | 24775                    | 25562                      |
| 25%   | 2.025      | 12101.25         | 6007.5      | 6061.25       | 450         | 402.5       | 410         | 196.25     | 140            | 56.25           | 35         | 56.25      | 56.25      | 40        | 50                     | 81.25                     | 5992.5                 | 201.25             | 35                 | 75                 | 50                 | 0                | 10              | 6291.25     | 4522.5      | 676.25            | 6636.25         | 441.25     | 3258.75             | 475               | 922.5                            | 100                            | 387.5                | 1653.75       | 1510               | 2453.75         | 1682.5           | 336.25              | 842.5               | 3113.75             | 662.25             | 130                 | 58208.75              | 46920                    | 49098.75                   |
| 50%   | 3.55       | 15782.5          | 7605        | 8272.5        | 885         | 965         | 960         | 480        | 287.5          | 142.5           | 85         | 125        | 105        | 65        | 95                     | 160                       | 8527.5                 | 465                | 92.5               | 157.5              | 120                | 50               | 45              | 8185        | 6637.5      | 1572.5            | 8627.5          | 622.5      | 4735                | 680               | 1247.5                           | 182.5                          | 540                  | 2820          | 2285               | 3382.5          | 2895             | 465                 | 900                 | 4075                | 1012               | 213                 | 68045                 | 54452.5                  | 55816.5                    |
| 75%   | 5.5        | 22278.75         | 10527.5     | 11587.5       | 1733.75     | 2823.75     | 1900        | 1020       | 537.5          | 286.25          | 206.25     | 405        | 232.5      | 120       | 192.5                  | 340                       | 12017.5                | 1157.5             | 342.5              | 401.25             | 342.5              | 308.75           | 210             | 11367.5     | 9720        | 2608.75           | 12225           | 987.5      | 6576.25             | 942.5             | 1898.75                          | 313.75                         | 825                  | 4138.75       | 3347.5             | 5010            | 4133.75          | 738.75              | 1030                | 5772.5              | 1447.5             | 457.75              | 81013.75              | 62671.25                 | 64589.5                    |
| max   | 37.6       | 45865            | 25555       | 26905         | 16790       | 17920       | 8730        | 4255       | 3475           | 3350            | 1180       | 3395       | 4265       | 300       | 1485                   | 1210                      | 22250                  | 13750              | 3715               | 2725               | 1440               | 5425             | 3865            | 24775       | 25825       | 9140              | 25160           | 2390       | 16410               | 2565              | 4920                             | 1300                           | 1810                 | 11900         | 7705               | 11745           | 11900            | 1465                | 1405                | 13860               | 4602               | 1647                | 423850                | 208310                   | 211492                     |

|                      | chinese language | bottom income decile | 2nd income decile | 3rd income decile | 4th income decile | 5th income decile |
|----------------------|------------------|----------------------|-------------------|-------------------|-------------------|-------------------|
| chinese language     | 1.000000         | 0.535922             | 0.445532          | 0.400154          | 0.375390          | 0.377489          |
| bottom income decile | 0.535922         | 1.000000             | 0.827265          | 0.758598          | 0.735213          | 0.717851          |

Using correlation to compare the most numerous ethnic group (which is the chinese) and the income, we did not find a very strong correlation. However, one will notice that there is a trend that increases the correlation of income to chinese language population as the income decile falls in the lower category.

![employment_areas.png](attachment:employment_areas.png)

In this employment data set, we can see that there is a large bulk of employment going to offices followed by institutional employment such as government and school. Somehow this represents more closely with the Toronto that we read about. We discuss in the next section the apparent contradicting data from the business registrations and employment.

|       | Manufacturing | Retail      | Service    | Office       | Institutional | Community/Entertainment | Total       |
|-------|---------------|-------------|------------|--------------|---------------|-------------------------|-------------|
| count | 32.000000     | 32.000000   | 32.00000   | 32.000000    | 32.000000     | 32.000000               | 32.00000    |
| mean  | 247.812500    | 935.312500  | 1231.25000 | 5542.187500  | 1865.000000   | 479.687500              | 10299.06250 |
| std   | 730.796006    | 1541.423691 | 1703.45143 | 8575.634176  | 3527.497514   | 914.466121              | 12040.29347 |
| min   | 0.000000      | 0.000000    | 0.00000    | 0.000000     | 0.000000      | 0.000000                | 0.00000     |
| 25%   | 0.000000      | 55.000000   | 85.00000   | 222.500000   | 172.500000    | 10.000000               | 837.50000   |
| 50%   | 30.000000     | 235.000000  | 580.00000  | 940.000000   | 400.000000    | 195.000000              | 5240.00000  |
| 75%   | 165.000000    | 1005.000000 | 1745.00000 | 8282.500000  | 2025.000000   | 387.500000              | 16872.50000 |
| max   | 4100.000000   | 7160.000000 | 8320.00000 | 31300.000000 | 15710.000000  | 4480.000000             | 45080.00000 |

|                         | Manufacturing | Retail   | Service  | Office   | Institutional | Community/Entertainment | Total    |
|-------------------------|---------------|----------|----------|----------|---------------|-------------------------|----------|
| Manufacturing           | 1.000000      | 0.045216 | 0.146448 | 0.005875 | -0.046916     | 0.005318                | 0.078069 |
| Retail                  | 0.045216      | 1.000000 | 0.467448 | 0.399142 | 0.102422      | 0.072432                | 0.516846 |
| Service                 | 0.146448      | 0.467448 | 1.000000 | 0.904150 | 0.042782      | 0.554819                | 0.908797 |
| Office                  | 0.005875      | 0.399142 | 0.904150 | 1.000000 | 0.012724      | 0.531598                | 0.935632 |
| Institutional           | -0.046916     | 0.102422 | 0.042782 | 0.012724 | 1.000000      | -0.007540               | 0.317827 |
| Community/Entertainment | 0.005318      | 0.072432 | 0.554819 | 0.531598 | -0.007540     | 1.000000                | 0.540419 |
| Total                   | 0.078069      | 0.516846 | 0.908797 | 0.935632 | 0.317827      | 0.540419                | 1.000000 |

Lastly, we ran clustering using Foursquare data to find any patterns that we missed and some of the more obvious data patterns as well. Optimal K for the clustering was placed at 3. We find that downtown central Toronto has the most establishments especially food establishments. This lines up well with our business registration information. We also notice that the M5 areas have the most employees. This chloropleth map ties in all the other data together.

![map_clusters.jpeg](attachment:map_clusters.jpeg)

### Discussion

With further literature review, we discover that Toronto is the financial center of Canada. Its manufacturing also produces half the output for Canada. Clearly, there is money here based incomes and going by literature review. Why does the municipal business registration not show the banks? 

It turns out that in Canada, there are two ways to register a business. The first way would be the provincial registration. The other way is the federal registration. We did not find any data on federal registration in the Toronto data portal. The banks would be where the money is. We also cannot use the business registration data as a direct indicator of money. 

However, business registration data that heavily indicates where the food businesses are is a good indicator of where people are because people have to eat. The presence of people do not ensure the presence of money. However, we have another source of data that quantifies the employment of people. Clearly, the areas with the most food businesses also have the most employees working in offices. Banks would be just one of those offices.

Rent is one aspect examined and in a number of neighbourhoods, the average rent is about half of average income. This leaves an employee with the other half for daily needs. However, in a system where there is a lot of government services, this might not too bad. On the upside, there are a fair number of neighbourhoods where the income is more than ten times the cost of rent. This is something worth looking at.

In the results portion, the correlation of income deciles to ethnicity was correlated. While income does not show which areas would have high rates of survival enterpreneurs, we can hypothesize that there can be some relationship. The result was an inverse relationship. The more Chinese there were, the greater the correlation to lower income decile. This is quite surprising for a result. However, perhaps this best describes survival enterpreneurs as well. Survival enterpreneurs go into business because they need to survive. It is possible that the areas with dense Chinese populations have more Chinese who cannot speak good English. Therefore, they cannot be employed and would have to go into business.

However, richer Chinese would be able to afford to get English education and be able to earn better incomes. Interestingly, communities with high incomes have ethnic groups that are less diverse.

### Conclusion

In this project, we were able to cover the basic needs particular food and shelter. These are basics that new immigrants needs to think about when moving to Toronto. We can make an informed decision on how much income to make and how much rent to pay. However, we did not stop with employment. We also took a look at enterpreneurhip as a possible way of migrating to Toronto. However, business data was not as easy to come by.

There is a wealth of information to be derived from extensive analysis of the neighbourhood data. From among the data sets in this project, this was probably the richest dataset. Further insights can still be derived from further analysis of this data set.

As rich as the demographic data was for getting a picture of what Toronto is, there were also limits to what could be derived due to the granularity of the data available. It is most probable that more granular data is available to the census office. However, given the time alloted, this more granular data did not seem as readily available.

### Bibliography

8 things immigrants should know about working in canada. Randstad Canada, n.d. https://www.randstad.ca/job-seeker/career-resources/working-in-canada/8-things-immigrants-should-know-about-working-in-canada/.
    
Canada, Employment and Social Development. “Survival to Success: Transforming Immigrant Outcomes.” Canada.ca. Government of Canada, June 8, 2020. https://www.canada.ca/en/employment-social-development/programs/foreign-credential-recognition/consultations.html.
        
Chrysostome, Elie. “(PDF) The Success Factors of Necessity Immigrant Entrepreneurs: In Search of a Model.” ResearchGate. Thunderbird International Business Review, March 2010. https://www.researchgate.net/publication/229907244_The_success_factors_of_necessity_immigrant_entrepreneurs_In_search_of_a_model.

“Economy of Toronto.” Wikipedia, Wikimedia Foundation, 16 Nov. 2020, en.wikipedia.org/wiki/Economy_of_Toronto. 

Griffith, Andrew. “Building a Mosaic: The Evolution of Canada's Approach to Immigrant Integration.” migrationpolicy.org. Migration Policy Institute, April 3, 2019. https://www.migrationpolicy.org/article/building-mosaic-evolution-canadas-approach-immigrant-integration.
    
Hosseini, Mana. “Is Canada Doing All It Can To Integrate New Immigrants?” Canada Immigration and Visa Information. Canadian Immigration Services and Free Online Evaluation. Canadian Citizenship &amp; Immigration Resource Center (CCIRC), August 21, 2017. https://www.immigration.ca/canada-can-integrate-new-immigrants.
   
Innovation, Science and Economic Development Canada. “Registering your business with the government.” Canada.ca, Government of Canada, August 29, 2019. www.canada.ca/en/services/business/start/register-with-gov.html. 

Liebig, Thomas. “Recruiting for Success Challenges for Canada’s Labour Migration System.” OECD.org, November 2016. http://www.oecd.org/migration/mig/recruiting-for-success-Canada.pdf.

Liston, Mary, and Joseph Carens. “Immigration and Integration in Canada.” Allard Research Commons. University of British Columbia, 2008. https://commons.allard.ubc.ca/cgi/viewcontent.cgi?article=1208&context=fac_pubs.