### Agenda

- Data Analysis with Pandas and Numpy

    - Introduction
    - Sorting
    - Indexing and retrieving data
    - Applying Functions to Cells, Columns and Rows
    - Grouping
    - Summary table
    - Pivot table
    - DataFrame transformations
    

### Data Analysis with Pandas and Numpy

In [513]:
import pandas as pd
import numpy as np

### Introduction

This notebook denoted to analyzing the loan approval classification using mostly Pandas and Numpy. 

I will start my analysis by reading the .csv file by calling "pandas.read_csv".

In [514]:
df = pd.read_csv('Bank_loan.csv', skipinitialspace=True)
df.head() #reading the first 5 rows of our loan approval prediction dataset

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,Graduate,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,Yes,1,Graduate,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,Yes,0,Graduate,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,Yes,0,Not Graduate,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,No,0,Graduate,No,6000,0.0,141.0,360.0,1.0,Urban,Y


Now, lets get some insights about the dimension, feature names (column names), and feature types of our dataset.

In [515]:
df.shape 

(614, 13)

Here, the "df.shape" return a tuple representing the dimensionality of the DataFrame (rows, columns). In our case, it shows us that we have 614 number of rows or clients and 13 number of features for each clients. 

In [516]:
df.columns

Index(['Loan_ID', 'Gender', 'Married', 'Dependents', 'Education',
       'Self_Employed', 'ApplicantIncome', 'CoapplicantIncome', 'LoanAmount',
       'Loan_Amount_Term', 'Credit_History', 'Property_Area', 'Loan_Status'],
      dtype='object')

In [517]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 13 columns):
Loan_ID              614 non-null object
Gender               601 non-null object
Married              611 non-null object
Dependents           599 non-null object
Education            614 non-null object
Self_Employed        582 non-null object
ApplicantIncome      614 non-null int64
CoapplicantIncome    614 non-null float64
LoanAmount           592 non-null float64
Loan_Amount_Term     600 non-null float64
Credit_History       564 non-null float64
Property_Area        614 non-null object
Loan_Status          614 non-null object
dtypes: float64(4), int64(1), object(8)
memory usage: 62.5+ KB


df.info() method prints information about a DataFrame including the index dtype and column dtypes, non-null values and memory usage. 
As it shows, the number of rows in all columns are not the same as the on what we had in the shape. Some rows have less number than 614 rows which shows that we have missing or Null values. 

Also, we have three different data types including four float64, one int64,a nd eight object data type. 

In [518]:
df.isnull().sum()

Loan_ID               0
Gender               13
Married               3
Dependents           15
Education             0
Self_Employed        32
ApplicantIncome       0
CoapplicantIncome     0
LoanAmount           22
Loan_Amount_Term     14
Credit_History       50
Property_Area         0
Loan_Status           0
dtype: int64

Here we used isnull().sum() method to sum up the total number of null or missing values for each column. 

In [519]:
df.fillna(method='bfill', inplace=True) #We replaced the null values with backward method.

In [520]:
df.isnull().sum() #Checking again to see if we still have null values. 

Loan_ID              0
Gender               0
Married              0
Dependents           0
Education            0
Self_Employed        0
ApplicantIncome      0
CoapplicantIncome    0
LoanAmount           0
Loan_Amount_Term     0
Credit_History       0
Property_Area        0
Loan_Status          0
dtype: int64

The describe method for numerical data provide information which include count, mean, std(standars deviation), min, max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75. The 50 percentile is the same as the median.

In [521]:
df.describe()

Unnamed: 0,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History
count,614.0,614.0,614.0,614.0,614.0
mean,5403.459283,1621.245798,146.416938,342.410423,0.84202
std,6109.041673,2926.248369,84.917398,64.428629,0.36502
min,150.0,0.0,9.0,12.0,0.0
25%,2877.5,0.0,100.0,360.0,1.0
50%,3812.5,1188.5,128.0,360.0,1.0
75%,5795.0,2297.25,166.75,360.0,1.0
max,81000.0,41667.0,700.0,480.0,1.0


For object data (e.g. strings or timestamps), the result’s index will include count, unique, top, and freq. The top is the most common value. The freq is the most common value’s frequency.

For mixed data types provided via a DataFrame, the default is to return only an analysis of numeric columns. If the dataframe consists only of object and categorical data without any numeric columns, the default is to return an analysis of both the object and categorical columns. If include='all' is provided as an option, the result will include a union of attributes of each type.

In [522]:
df.describe(include='all')

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
count,614,614,614,614.0,614,614,614.0,614.0,614.0,614.0,614.0,614,614
unique,614,2,2,4.0,2,2,,,,,,3,2
top,LP002379,Male,Yes,0.0,Graduate,No,,,,,,Semiurban,Y
freq,1,501,399,354.0,480,528,,,,,,233,422
mean,,,,,,,5403.459283,1621.245798,146.416938,342.410423,0.84202,,
std,,,,,,,6109.041673,2926.248369,84.917398,64.428629,0.36502,,
min,,,,,,,150.0,0.0,9.0,12.0,0.0,,
25%,,,,,,,2877.5,0.0,100.0,360.0,1.0,,
50%,,,,,,,3812.5,1188.5,128.0,360.0,1.0,,
75%,,,,,,,5795.0,2297.25,166.75,360.0,1.0,,


You may also define the type of features for the categorical and binary features.

In [523]:
df.describe(include= 'object')

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,Property_Area,Loan_Status
count,614,614,614,614,614,614,614,614
unique,614,2,2,4,2,2,3,2
top,LP002379,Male,Yes,0,Graduate,No,Semiurban,Y
freq,1,501,399,354,480,528,233,422


For categorical fatures we can use 'value_counts()' method inorder to count unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.

Lets find out the distribution of Loan_Status in our dataset.

In [524]:
df['Loan_Status'].value_counts()

Y    422
N    192
Name: Loan_Status, dtype: int64

Here, it shows that 422 out of 614 customer approved for the loan and 192 out of 614 did not approved for the loan. 

Also, we can pass the pameter 'normalized=True' which will return the relative frequencies of the unique values.

In [525]:
df['Loan_Status'].value_counts(normalize=True)

Y    0.687296
N    0.312704
Name: Loan_Status, dtype: float64

### Sorting

The value of one or more variable can be sorted by using the pandas sort method. For example, here we can sort by 'ApplicantIncome' or sort by 'ApplicantIncome' & 'Loan_Status'.

In [526]:
df.sort_values(by='ApplicantIncome', ascending=False).head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
409,LP002317,Male,Yes,3+,Graduate,No,81000,0.0,360.0,360.0,0.0,Rural,N
333,LP002101,Male,Yes,0,Graduate,Yes,63337,0.0,490.0,180.0,1.0,Urban,Y
171,LP001585,Male,Yes,3+,Graduate,No,51763,0.0,700.0,300.0,1.0,Urban,Y
155,LP001536,Male,Yes,3+,Graduate,No,39999,0.0,600.0,180.0,0.0,Semiurban,Y
185,LP001640,Male,Yes,0,Graduate,Yes,39147,4750.0,120.0,360.0,1.0,Semiurban,Y


In [527]:
df.sort_values(['ApplicantIncome','Loan_Status'], ascending=[False, True]).head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
409,LP002317,Male,Yes,3+,Graduate,No,81000,0.0,360.0,360.0,0.0,Rural,N
333,LP002101,Male,Yes,0,Graduate,Yes,63337,0.0,490.0,180.0,1.0,Urban,Y
171,LP001585,Male,Yes,3+,Graduate,No,51763,0.0,700.0,300.0,1.0,Urban,Y
155,LP001536,Male,Yes,3+,Graduate,No,39999,0.0,600.0,180.0,0.0,Semiurban,Y
185,LP001640,Male,Yes,0,Graduate,Yes,39147,4750.0,120.0,360.0,1.0,Semiurban,Y


### Indexing and retrieving data

There are a variety of different ways to index a DataFrame.

- .loc is primarily label based, which means we can index by the name of the column.
- .iloc is primarily integer position based (from 0 to length-1 of the axis), which means we can index by number of column.
- Also, to get a single column, you can use a DataFrame['Name'] construction.

In the .loc method we say that first give us the value of the rows (0:6), and second get this values from columns (Dependents to LoanAmount).

In [528]:
df.loc[0:6, 'Dependents':'LoanAmount']

Unnamed: 0,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount
0,0,Graduate,No,5849,0.0,128.0
1,1,Graduate,No,4583,1508.0,128.0
2,0,Graduate,Yes,3000,0.0,66.0
3,0,Not Graduate,No,2583,2358.0,120.0
4,0,Graduate,No,6000,0.0,141.0
5,2,Graduate,Yes,5417,4196.0,267.0
6,0,Not Graduate,No,2333,1516.0,95.0


In the .iloc method we say first give us the value of the rows(0:6) and second get this values from columns (3:9). You have to keep this in mind that the as typical Python slice the maximal value is not included.

In [529]:
df.iloc[0:6, 3:9]

Unnamed: 0,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount
0,0,Graduate,No,5849,0.0,128.0
1,1,Graduate,No,4583,1508.0,128.0
2,0,Graduate,Yes,3000,0.0,66.0
3,0,Not Graduate,No,2583,2358.0,120.0
4,0,Graduate,No,6000,0.0,141.0
5,2,Graduate,Yes,5417,4196.0,267.0


Using brackets for indexing one column is another convenient. Let's use this to answer a question about that column alone:

**What is the proportion of Loan Status in our dataframe?**|

In [530]:
df['Loan_Status'].replace(('Y', 'N'), (1, 0), inplace=True) #replacing Y and N to 1 and 0 to be able to get the mean.

In [531]:
df['Loan_Status'].mean()

0.6872964169381107

This means that 68% of our customers approved to receive the bank loan and 32% did not approve to receive the loan.

**Boolean indexing** with one column is also very convenient. The syntax is df[P(df['Name'])], where P is some logical condition that is checked for each element of the Name column. The result of such indexing is the DataFrame consisting only of rows that satisfy the P condition on the Name column.

To see how this works, let me ask some questions.

**What are average values of numerical features for Loan_Status?**

In [532]:
df[df['Loan_Status'] == 1].mean()

ApplicantIncome      5384.068720
CoapplicantIncome    1504.516398
LoanAmount            144.135071
Loan_Amount_Term      341.431280
Credit_History          0.969194
Loan_Status             1.000000
dtype: float64

In [533]:
df[df['Loan_Status'] == 0].mean()

ApplicantIncome      5446.078125
CoapplicantIncome    1877.807292
LoanAmount            151.432292
Loan_Amount_Term      344.562500
Credit_History          0.562500
Loan_Status             0.000000
dtype: float64

As you can see, the credit history has a significant impact on the bank loan approval.

**How much (on average) is the applicant's income from the customer who approved for the loan?**

In [534]:
df[df['Loan_Status'] == 1]['ApplicantIncome'].mean()

5384.068720379147

**How much (on average) is the applicant's income from the customer who did not approved for the loan?**

In [535]:
df[df['Loan_Status'] == 0]['ApplicantIncome'].mean()

5446.078125

**What is the maximum loan amount that approved for the bank loan?**

In [536]:
df[df['Loan_Status'] == 1]['LoanAmount'].max()

700.0

**What is the maximum amount of the loan for the customer who approved for the loan and (&) did not have a credit history?**

In [537]:
df[(df['Loan_Status'] == 1) & (df['Credit_History'] == 0)]['LoanAmount'].max()

600.0

**Select all female applicants with income of more than 5000!**

In [538]:
df[(df['Gender'] == 'Female') & (df['ApplicantIncome'] > 5000)].head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
54,LP001186,Female,Yes,1,Graduate,Yes,11500,0.0,286.0,360.0,0.0,Urban,0
113,LP001392,Female,No,1,Graduate,Yes,7451,0.0,118.0,360.0,1.0,Semiurban,1
119,LP001422,Female,No,0,Graduate,No,10408,0.0,259.0,360.0,1.0,Urban,1
146,LP001516,Female,Yes,2,Graduate,No,14866,0.0,70.0,360.0,1.0,Urban,1
148,LP001519,Female,No,0,Graduate,No,10000,1666.0,225.0,360.0,1.0,Rural,0


### Applying 

Function to apply to each column or row.

Axis along which the function is applied:

- 0 or ‘index’: apply function to each column (default 0).

- 1 or ‘columns’: apply function to each row.

In [539]:
df.apply(np.max)

Loan_ID                  LP002990
Gender                       Male
Married                       Yes
Dependents                     3+
Education            Not Graduate
Self_Employed                 Yes
ApplicantIncome             81000
CoapplicantIncome           41667
LoanAmount                    700
Loan_Amount_Term              480
Credit_History                  1
Property_Area               Urban
Loan_Status                     1
dtype: object

We can also use the **Lambda** function to select all applicants with income of more than 7000.

In [540]:
df[df['ApplicantIncome'].apply(lambda x: x > 7000)].head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
9,LP001020,Male,Yes,1,Graduate,No,12841,10968.0,349.0,360.0,1.0,Semiurban,0
20,LP001043,Male,Yes,0,Not Graduate,No,7660,0.0,104.0,360.0,0.0,Urban,0
25,LP001066,Male,Yes,0,Graduate,Yes,9560,0.0,191.0,360.0,1.0,Semiurban,1
34,LP001100,Male,No,3+,Graduate,No,12500,3000.0,320.0,360.0,1.0,Rural,0
54,LP001186,Female,Yes,1,Graduate,Yes,11500,0.0,286.0,360.0,0.0,Urban,0


The **map** method can be used to replace values in a column by passing a dictionary of the form {old_value: new_value} as its argument:

In [541]:
n = {'Not Graduate': 0, 'Graduate': 1}
df['Education'] = df['Education'].map(n)

#df['Education'] = df['Education'].map({'Not Graduate': 0, 'Graduate': 1}) equivalent to the above code!

df.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,No,0,1,No,5849,0.0,128.0,360.0,1.0,Urban,1
1,LP001003,Male,Yes,1,1,No,4583,1508.0,128.0,360.0,1.0,Rural,0
2,LP001005,Male,Yes,0,1,Yes,3000,0.0,66.0,360.0,1.0,Urban,1
3,LP001006,Male,Yes,0,0,No,2583,2358.0,120.0,360.0,1.0,Urban,1
4,LP001008,Male,No,0,1,No,6000,0.0,141.0,360.0,1.0,Urban,1


The Replace Method will do almost the same thing.

In [508]:
df['Married'] = df['Married'].replace({'No': 0, 'Yes':1})
df.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status
0,LP001002,Male,0.0,0,1,No,5849,0.0,,360.0,1.0,Urban,Y
1,LP001003,Male,1.0,1,1,No,4583,1508.0,128.0,360.0,1.0,Rural,N
2,LP001005,Male,1.0,0,1,Yes,3000,0.0,66.0,360.0,1.0,Urban,Y
3,LP001006,Male,1.0,0,0,No,2583,2358.0,120.0,360.0,1.0,Urban,Y
4,LP001008,Male,0.0,0,1,No,6000,0.0,141.0,360.0,1.0,Urban,Y


### Grouping 

In general, the grouping data in Pandas will be as follows:

*df.groupby(by=grouping_columns)[columns_to_show].function()*

1. First, the groupby method divides the grouping_columns by their values. They become a new index in the resulting dataframe.


2. Then, columns of interest are selected (columns_to_show). If columns_to_show is not included, all non groupby clauses will be included.


3. Finally, one or several functions are applied to the obtained groups per selected columns.

Here is an example where we group the data according to the Loan_Status variable and display statistics of three columns in each group:



In [483]:
columns_to_show = ['ApplicantIncome', 'CoapplicantIncome']

df.groupby(['Loan_Status'])[columns_to_show].describe(percentiles=[])

Unnamed: 0_level_0,ApplicantIncome,ApplicantIncome,ApplicantIncome,ApplicantIncome,ApplicantIncome,ApplicantIncome,CoapplicantIncome,CoapplicantIncome,CoapplicantIncome,CoapplicantIncome,CoapplicantIncome,CoapplicantIncome
Unnamed: 0_level_1,count,mean,std,min,50%,max,count,mean,std,min,50%,max
Loan_Status,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
0,192.0,5446.078125,6819.558528,150.0,3833.5,81000.0,192.0,1877.807292,4384.060103,0.0,268.0,41667.0
1,422.0,5384.06872,5765.441615,210.0,3812.5,63337.0,422.0,1504.516398,1924.754855,0.0,1239.5,20000.0


Let’s do the same thing, but slightly differently by passing a list of functions to agg():

In [484]:
columns_to_show = ['ApplicantIncome', 'CoapplicantIncome']

df.groupby(['Loan_Status'])[columns_to_show].agg([np.mean, np.std, np.min, np.max])

Unnamed: 0_level_0,ApplicantIncome,ApplicantIncome,ApplicantIncome,ApplicantIncome,CoapplicantIncome,CoapplicantIncome,CoapplicantIncome,CoapplicantIncome
Unnamed: 0_level_1,mean,std,amin,amax,mean,std,amin,amax
Loan_Status,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
0,5446.078125,6819.558528,150,81000,1877.807292,4384.060103,0.0,41667.0
1,5384.06872,5765.441615,210,63337,1504.516398,1924.754855,0.0,20000.0


### Summary Table

Suppose we want to see how the observations in our dataset are distributed in the context of two variables - Loan_Status and Dependents. To do so, we can build a contingency table using the crosstab method:

In [542]:
pd.crosstab(df['Loan_Status'], df['Dependents'])

Dependents,0,1,2,3+
Loan_Status,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,110,36,28,18
1,244,68,77,33


In [543]:
pd.crosstab(df['Loan_Status'], df['Dependents'], normalize=True)

Dependents,0,1,2,3+
Loan_Status,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.179153,0.058632,0.045603,0.029316
1,0.397394,0.110749,0.125407,0.053746


We can learn from our dataset that customers with 0 dependents are more likely to be approved for bank loan, however as the number of dependents increases the probability of bank loan approval decreases.

In [492]:
pd.crosstab(df['Loan_Status'], df['Self_Employed'], margins=True)

Self_Employed,No,Yes,All
Loan_Status,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,164,28,192
1,364,58,422
All,528,86,614


The above table shows that 364 customers out of 422 who approved for the loan were not self-employed. This is equivalent to 86% of loan approval applications.

### Pivot table

The pivot_table method takes the following parameters:

- values – a list of variables to calculate statistics for,
- index – a list of variables to group data by,
- aggfunc – what statistics we need to calculate for groups, ex. sum, mean, maximum, minimum or something else.

In [544]:
df.pivot_table(['ApplicantIncome', 'CoapplicantIncome', 'LoanAmount', 'Credit_History', 'Loan_Status'],
               ['Property_Area'], aggfunc='mean')

Unnamed: 0_level_0,ApplicantIncome,CoapplicantIncome,Credit_History,LoanAmount,Loan_Status
Property_Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Rural,5554.083799,1645.536983,0.832402,151.698324,0.614525
Semiurban,5292.261803,1520.133047,0.858369,146.223176,0.76824
Urban,5398.247525,1716.350495,0.831683,141.960396,0.658416


As seen in the pivot table, the properties which located in the Semiurban areas has the highest loan approval. Also, Semiurban, Rural, and Urban property areas with 0.8583, 0.8324, and 0.8316 respectively have the highest credit history.

In [545]:
df.pivot_table(['Loan_Status'], ['Loan_Amount_Term'], aggfunc=np.sum)

Unnamed: 0_level_0,Loan_Status
Loan_Amount_Term,Unnamed: 1_level_1
12.0,1
36.0,0
60.0,2
84.0,3
120.0,3
180.0,29
240.0,3
300.0,8
360.0,367
480.0,6


As shown in the pivot table, 367 of the 422 customers who approved for the bank loan selected the 360-month (30-year) loan amount term. This is equal to 86.96% $\approx$ 87% of applications for loan approval.

### DataFrame Transformation

In this section we want to add new columns to our DataFrame inorder to help us understand our dataset better.

Let's create a new column by adding ApplicantIncome and CoapplicantIncome and call it **Total_Income**. 

In [490]:
df['Total_Income'] = df['ApplicantIncome'] + df['CoapplicantIncome']
df.head()

Unnamed: 0,Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status,Total_Income
0,LP001002,Male,0,0,1,No,5849,0.0,128.0,360.0,1.0,Urban,1,5849.0
1,LP001003,Male,1,1,1,No,4583,1508.0,128.0,360.0,1.0,Rural,0,6091.0
2,LP001005,Male,1,0,1,Yes,3000,0.0,66.0,360.0,1.0,Urban,1,3000.0
3,LP001006,Male,1,0,0,No,2583,2358.0,120.0,360.0,1.0,Urban,1,4941.0
4,LP001008,Male,0,0,1,No,6000,0.0,141.0,360.0,1.0,Urban,1,6000.0
