## Business Funding Data

After Exploring the Business Funding Data.csv file, I observed that the data had a total of 11 columns and 26 entries with effective date, financial type and financial type normalized having the highest numbers of null values, the dataset showed a number of business funding round in Nigeria, their website domain, Investors, Investor counts, time of founding, categories and more. I imported the file using the pd.read_csv function using latin1 encoding.

To clean, preprocess and transform the data, I first started by checking the percentage of missing values in each column using the isnull().sum()/ len(df) function, after which I checked the unique values of the highest missing value column (effective date). I had to drop the column as it was irrelevant and had a mode of Null, to the data.

The mode of Financing type normalized which was seed, was used to replace the Nan Values in the column to avoid Null values and improve data quality. The financing type column was also dropped as it had a reoccurring value of NaN as a mode. Duplicate value was checked for as the data had 0 duplicate rows.

For the Investors column, the mode (accelia.vc) was used to replace missing values using the fillna function. The mean of the investor count column was calculated and used to represent missing values in the investors count column.

During Data Transformation, Label encoding was used on the website domain, financing type normalized, investors and categories column to transform categorical column to numeric using the sklearn preprocessing package Label Encoder.

Using the MinMaxScaler standardization technique, the investors count column was scaled from a min and max of 1 and 9 to a min and max of 0 and 1 respectively. This was used over standardscaler as it gave a more appropriate range.

It is important to preprocess data before real-world data analysis as it helps in handling mistakes due to human errors in terms of missing values, outliers, and duplicates which reduces data quality and integrity.



In [39]:
import numpy as np
import pandas as pd

In [40]:
df = pd.read_csv("Business Funding Data.csv", encoding='latin1')
df.head()

Unnamed: 0,Website Domain,Effective date,Found At,Financing Type,Financing Type Normalized,Categories,Investors,Investors Count,Amount,Amount Normalized,Source Urls
0,trafigura.com,,2024-03-14T01:00:00+01:00,,,[],,,$1.9b,1900000000,https://www.tradefinanceglobal.com/posts/trafi...
1,zenobe.com,,2024-05-31T02:00:00+02:00,,,[],"avivainvestors.com, lloydsbankinggroup.com, sa...",9.0,$522.7 million,522700000,https://realassets.ipe.com/news/aviva-among-le...
2,zenobe.com,,2024-07-24T02:00:00+02:00,,,"[""private_equity""]",,,£41.7m,53671000,https://www.innovationnewsnetwork.com/zenobe-a...
3,canva.com,,2024-05-01T02:00:00+02:00,,,[],stackcapitalgroup.com,1.0,US$8 million,8000000,https://www.globenewswire.com/news-release/202...
4,fidelity.com,,2024-04-11T02:00:00+02:00,,,[],chevychasetrust.com,1.0,$1.96 million,1960000,https://www.defenseworld.net/2024/04/11/chevy-...


In [41]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 11 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Website Domain             26 non-null     object 
 1   Effective date             6 non-null      object 
 2   Found At                   26 non-null     object 
 3   Financing Type             8 non-null      object 
 4   Financing Type Normalized  8 non-null      object 
 5   Categories                 26 non-null     object 
 6   Investors                  13 non-null     object 
 7   Investors Count            13 non-null     float64
 8   Amount                     26 non-null     object 
 9   Amount Normalized          26 non-null     int64  
 10  Source Urls                26 non-null     object 
dtypes: float64(1), int64(1), object(9)
memory usage: 2.4+ KB


## Missing Values

In [42]:
# Calculate percentage of missing values for each column
missing_percentage = df.isnull().sum() / len(df) * 100
missing_percentage

Website Domain                0.000000
Effective date               76.923077
Found At                      0.000000
Financing Type               69.230769
Financing Type Normalized    69.230769
Categories                    0.000000
Investors                    50.000000
Investors Count              50.000000
Amount                        0.000000
Amount Normalized             0.000000
Source Urls                   0.000000
dtype: float64

In [43]:
df ['Effective date']. unique()

array([nan, '2024-04-18T02:00:00+02:00', '2024-04-16T02:00:00+02:00',
       '2024-06-20T02:00:00+02:00', '2024-04-24T02:00:00+02:00',
       '2024-06-26T02:00:00+02:00', '2024-06-27T02:00:00+02:00'],
      dtype=object)

In [44]:
df ['Effective date'].mode(0)

0    NaN
Name: Effective date, dtype: object

### Droping the Effective date column as it has the highest number of Null value

In [45]:
df.drop(columns= ["Effective date"], inplace= True )
df

Unnamed: 0,Website Domain,Found At,Financing Type,Financing Type Normalized,Categories,Investors,Investors Count,Amount,Amount Normalized,Source Urls
0,trafigura.com,2024-03-14T01:00:00+01:00,,,[],,,$1.9b,1900000000,https://www.tradefinanceglobal.com/posts/trafi...
1,zenobe.com,2024-05-31T02:00:00+02:00,,,[],"avivainvestors.com, lloydsbankinggroup.com, sa...",9.0,$522.7 million,522700000,https://realassets.ipe.com/news/aviva-among-le...
2,zenobe.com,2024-07-24T02:00:00+02:00,,,"[""private_equity""]",,,£41.7m,53671000,https://www.innovationnewsnetwork.com/zenobe-a...
3,canva.com,2024-05-01T02:00:00+02:00,,,[],stackcapitalgroup.com,1.0,US$8 million,8000000,https://www.globenewswire.com/news-release/202...
4,fidelity.com,2024-04-11T02:00:00+02:00,,,[],chevychasetrust.com,1.0,$1.96 million,1960000,https://www.defenseworld.net/2024/04/11/chevy-...
5,swtchenergy.com,2024-04-24T02:00:00+02:00,Series B,series_b,"[""series_b"", ""venture""]","alantra.com, blueearth.capital",2.0,$27.2 Million,27200000,https://www.mercomindia.com/funding-and-ma-rou...
6,carnow.com,2024-04-16T02:00:00+02:00,,,"[""debt_financing""]",runwaygrowth.com,1.0,$40 million,40000000,https://www.prnewswire.com/news-releases/runwa...
7,databricks.com,2024-08-07T02:00:00+02:00,Series I,series_i,"[""series_i"", ""venture""]",,,$685 million,685000000,https://iteuropa.com/news/large-language-model...
8,anthropic.com,2024-07-08T02:00:00+02:00,,,[],damachotelsandresorts.com,1.0,$50mn,50000000,https://www.arabianbusiness.com/industries/tec...
9,ey.com,2024-04-18T02:00:00+02:00,,,[],,,AU$10.7M,6865000,https://www.biometricupdate.com/202404/ey-secu...


In [46]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Website Domain             26 non-null     object 
 1   Found At                   26 non-null     object 
 2   Financing Type             8 non-null      object 
 3   Financing Type Normalized  8 non-null      object 
 4   Categories                 26 non-null     object 
 5   Investors                  13 non-null     object 
 6   Investors Count            13 non-null     float64
 7   Amount                     26 non-null     object 
 8   Amount Normalized          26 non-null     int64  
 9   Source Urls                26 non-null     object 
dtypes: float64(1), int64(1), object(8)
memory usage: 2.2+ KB


In [47]:
df['Financing Type'].unique()

array([nan, 'Series B', 'Series I', 'Seed', 'Series A2', 'Series A'],
      dtype=object)

In [48]:
df ['Financing Type'].mode(0)

0    NaN
Name: Financing Type, dtype: object

In [49]:
df ['Financing Type Normalized'].mode()

0    seed
Name: Financing Type Normalized, dtype: object

### Replacing Nan values of "Financing Type Normalized" column with the mode 

In [50]:
df['Financing Type Normalized'].fillna(df['Financing Type Normalized'].mode()[0], inplace=True)
df

Unnamed: 0,Website Domain,Found At,Financing Type,Financing Type Normalized,Categories,Investors,Investors Count,Amount,Amount Normalized,Source Urls
0,trafigura.com,2024-03-14T01:00:00+01:00,,seed,[],,,$1.9b,1900000000,https://www.tradefinanceglobal.com/posts/trafi...
1,zenobe.com,2024-05-31T02:00:00+02:00,,seed,[],"avivainvestors.com, lloydsbankinggroup.com, sa...",9.0,$522.7 million,522700000,https://realassets.ipe.com/news/aviva-among-le...
2,zenobe.com,2024-07-24T02:00:00+02:00,,seed,"[""private_equity""]",,,£41.7m,53671000,https://www.innovationnewsnetwork.com/zenobe-a...
3,canva.com,2024-05-01T02:00:00+02:00,,seed,[],stackcapitalgroup.com,1.0,US$8 million,8000000,https://www.globenewswire.com/news-release/202...
4,fidelity.com,2024-04-11T02:00:00+02:00,,seed,[],chevychasetrust.com,1.0,$1.96 million,1960000,https://www.defenseworld.net/2024/04/11/chevy-...
5,swtchenergy.com,2024-04-24T02:00:00+02:00,Series B,series_b,"[""series_b"", ""venture""]","alantra.com, blueearth.capital",2.0,$27.2 Million,27200000,https://www.mercomindia.com/funding-and-ma-rou...
6,carnow.com,2024-04-16T02:00:00+02:00,,seed,"[""debt_financing""]",runwaygrowth.com,1.0,$40 million,40000000,https://www.prnewswire.com/news-releases/runwa...
7,databricks.com,2024-08-07T02:00:00+02:00,Series I,series_i,"[""series_i"", ""venture""]",,,$685 million,685000000,https://iteuropa.com/news/large-language-model...
8,anthropic.com,2024-07-08T02:00:00+02:00,,seed,[],damachotelsandresorts.com,1.0,$50mn,50000000,https://www.arabianbusiness.com/industries/tec...
9,ey.com,2024-04-18T02:00:00+02:00,,seed,[],,,AU$10.7M,6865000,https://www.biometricupdate.com/202404/ey-secu...


In [51]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Website Domain             26 non-null     object 
 1   Found At                   26 non-null     object 
 2   Financing Type             8 non-null      object 
 3   Financing Type Normalized  26 non-null     object 
 4   Categories                 26 non-null     object 
 5   Investors                  13 non-null     object 
 6   Investors Count            13 non-null     float64
 7   Amount                     26 non-null     object 
 8   Amount Normalized          26 non-null     int64  
 9   Source Urls                26 non-null     object 
dtypes: float64(1), int64(1), object(8)
memory usage: 2.2+ KB


### Dropping Financing Type column

In [52]:
df.drop(columns=['Financing Type'], inplace=True)
df

Unnamed: 0,Website Domain,Found At,Financing Type Normalized,Categories,Investors,Investors Count,Amount,Amount Normalized,Source Urls
0,trafigura.com,2024-03-14T01:00:00+01:00,seed,[],,,$1.9b,1900000000,https://www.tradefinanceglobal.com/posts/trafi...
1,zenobe.com,2024-05-31T02:00:00+02:00,seed,[],"avivainvestors.com, lloydsbankinggroup.com, sa...",9.0,$522.7 million,522700000,https://realassets.ipe.com/news/aviva-among-le...
2,zenobe.com,2024-07-24T02:00:00+02:00,seed,"[""private_equity""]",,,£41.7m,53671000,https://www.innovationnewsnetwork.com/zenobe-a...
3,canva.com,2024-05-01T02:00:00+02:00,seed,[],stackcapitalgroup.com,1.0,US$8 million,8000000,https://www.globenewswire.com/news-release/202...
4,fidelity.com,2024-04-11T02:00:00+02:00,seed,[],chevychasetrust.com,1.0,$1.96 million,1960000,https://www.defenseworld.net/2024/04/11/chevy-...
5,swtchenergy.com,2024-04-24T02:00:00+02:00,series_b,"[""series_b"", ""venture""]","alantra.com, blueearth.capital",2.0,$27.2 Million,27200000,https://www.mercomindia.com/funding-and-ma-rou...
6,carnow.com,2024-04-16T02:00:00+02:00,seed,"[""debt_financing""]",runwaygrowth.com,1.0,$40 million,40000000,https://www.prnewswire.com/news-releases/runwa...
7,databricks.com,2024-08-07T02:00:00+02:00,series_i,"[""series_i"", ""venture""]",,,$685 million,685000000,https://iteuropa.com/news/large-language-model...
8,anthropic.com,2024-07-08T02:00:00+02:00,seed,[],damachotelsandresorts.com,1.0,$50mn,50000000,https://www.arabianbusiness.com/industries/tec...
9,ey.com,2024-04-18T02:00:00+02:00,seed,[],,,AU$10.7M,6865000,https://www.biometricupdate.com/202404/ey-secu...


In [53]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Website Domain             26 non-null     object 
 1   Found At                   26 non-null     object 
 2   Financing Type Normalized  26 non-null     object 
 3   Categories                 26 non-null     object 
 4   Investors                  13 non-null     object 
 5   Investors Count            13 non-null     float64
 6   Amount                     26 non-null     object 
 7   Amount Normalized          26 non-null     int64  
 8   Source Urls                26 non-null     object 
dtypes: float64(1), int64(1), object(7)
memory usage: 2.0+ KB


Checking for Duplicates

In [55]:
duplicate = df.duplicated().sum()
print (f"\nNumber of duplicate rows: {duplicate}")


Number of duplicate rows: 0


In [56]:
df.head(10)

Unnamed: 0,Website Domain,Found At,Financing Type Normalized,Categories,Investors,Investors Count,Amount,Amount Normalized,Source Urls
0,trafigura.com,2024-03-14T01:00:00+01:00,seed,[],,,$1.9b,1900000000,https://www.tradefinanceglobal.com/posts/trafi...
1,zenobe.com,2024-05-31T02:00:00+02:00,seed,[],"avivainvestors.com, lloydsbankinggroup.com, sa...",9.0,$522.7 million,522700000,https://realassets.ipe.com/news/aviva-among-le...
2,zenobe.com,2024-07-24T02:00:00+02:00,seed,"[""private_equity""]",,,£41.7m,53671000,https://www.innovationnewsnetwork.com/zenobe-a...
3,canva.com,2024-05-01T02:00:00+02:00,seed,[],stackcapitalgroup.com,1.0,US$8 million,8000000,https://www.globenewswire.com/news-release/202...
4,fidelity.com,2024-04-11T02:00:00+02:00,seed,[],chevychasetrust.com,1.0,$1.96 million,1960000,https://www.defenseworld.net/2024/04/11/chevy-...
5,swtchenergy.com,2024-04-24T02:00:00+02:00,series_b,"[""series_b"", ""venture""]","alantra.com, blueearth.capital",2.0,$27.2 Million,27200000,https://www.mercomindia.com/funding-and-ma-rou...
6,carnow.com,2024-04-16T02:00:00+02:00,seed,"[""debt_financing""]",runwaygrowth.com,1.0,$40 million,40000000,https://www.prnewswire.com/news-releases/runwa...
7,databricks.com,2024-08-07T02:00:00+02:00,series_i,"[""series_i"", ""venture""]",,,$685 million,685000000,https://iteuropa.com/news/large-language-model...
8,anthropic.com,2024-07-08T02:00:00+02:00,seed,[],damachotelsandresorts.com,1.0,$50mn,50000000,https://www.arabianbusiness.com/industries/tec...
9,ey.com,2024-04-18T02:00:00+02:00,seed,[],,,AU$10.7M,6865000,https://www.biometricupdate.com/202404/ey-secu...


In [61]:
df ['Investors'].unique()

array([nan,
       'avivainvestors.com, lloydsbankinggroup.com, santander.co.uk, swip.com, cibc.com, societegenerale.com, natwest.us, rabobank.com, mufg.jp',
       'stackcapitalgroup.com', 'chevychasetrust.com',
       'alantra.com, blueearth.capital', 'runwaygrowth.com',
       'damachotelsandresorts.com', 'surocap.com', 'eib.org',
       'vistaragrowth.com', 'accelia.vc',
       'edc.ca, desjardinscapital.com, fondsftq.com', 'cibc.com',
       'inovia.vc'], dtype=object)

In [63]:
df ['Investors'].mode()[0]

'accelia.vc'

In [64]:
df ['Investors']. fillna(df['Investors'].mode()[0], inplace=True)
df

Unnamed: 0,Website Domain,Found At,Financing Type Normalized,Categories,Investors,Investors Count,Amount,Amount Normalized,Source Urls
0,trafigura.com,2024-03-14T01:00:00+01:00,seed,[],accelia.vc,,$1.9b,1900000000,https://www.tradefinanceglobal.com/posts/trafi...
1,zenobe.com,2024-05-31T02:00:00+02:00,seed,[],"avivainvestors.com, lloydsbankinggroup.com, sa...",9.0,$522.7 million,522700000,https://realassets.ipe.com/news/aviva-among-le...
2,zenobe.com,2024-07-24T02:00:00+02:00,seed,"[""private_equity""]",accelia.vc,,£41.7m,53671000,https://www.innovationnewsnetwork.com/zenobe-a...
3,canva.com,2024-05-01T02:00:00+02:00,seed,[],stackcapitalgroup.com,1.0,US$8 million,8000000,https://www.globenewswire.com/news-release/202...
4,fidelity.com,2024-04-11T02:00:00+02:00,seed,[],chevychasetrust.com,1.0,$1.96 million,1960000,https://www.defenseworld.net/2024/04/11/chevy-...
5,swtchenergy.com,2024-04-24T02:00:00+02:00,series_b,"[""series_b"", ""venture""]","alantra.com, blueearth.capital",2.0,$27.2 Million,27200000,https://www.mercomindia.com/funding-and-ma-rou...
6,carnow.com,2024-04-16T02:00:00+02:00,seed,"[""debt_financing""]",runwaygrowth.com,1.0,$40 million,40000000,https://www.prnewswire.com/news-releases/runwa...
7,databricks.com,2024-08-07T02:00:00+02:00,series_i,"[""series_i"", ""venture""]",accelia.vc,,$685 million,685000000,https://iteuropa.com/news/large-language-model...
8,anthropic.com,2024-07-08T02:00:00+02:00,seed,[],damachotelsandresorts.com,1.0,$50mn,50000000,https://www.arabianbusiness.com/industries/tec...
9,ey.com,2024-04-18T02:00:00+02:00,seed,[],accelia.vc,,AU$10.7M,6865000,https://www.biometricupdate.com/202404/ey-secu...


In [65]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Website Domain             26 non-null     object 
 1   Found At                   26 non-null     object 
 2   Financing Type Normalized  26 non-null     object 
 3   Categories                 26 non-null     object 
 4   Investors                  26 non-null     object 
 5   Investors Count            13 non-null     float64
 6   Amount                     26 non-null     object 
 7   Amount Normalized          26 non-null     int64  
 8   Source Urls                26 non-null     object 
dtypes: float64(1), int64(1), object(7)
memory usage: 2.0+ KB


In [66]:
df ['Investors Count']. mean()

1.8461538461538463

In [67]:
df ['Investors Count']. fillna(df ['Investors Count'].mean(), inplace=True)
df

Unnamed: 0,Website Domain,Found At,Financing Type Normalized,Categories,Investors,Investors Count,Amount,Amount Normalized,Source Urls
0,trafigura.com,2024-03-14T01:00:00+01:00,seed,[],accelia.vc,1.846154,$1.9b,1900000000,https://www.tradefinanceglobal.com/posts/trafi...
1,zenobe.com,2024-05-31T02:00:00+02:00,seed,[],"avivainvestors.com, lloydsbankinggroup.com, sa...",9.0,$522.7 million,522700000,https://realassets.ipe.com/news/aviva-among-le...
2,zenobe.com,2024-07-24T02:00:00+02:00,seed,"[""private_equity""]",accelia.vc,1.846154,£41.7m,53671000,https://www.innovationnewsnetwork.com/zenobe-a...
3,canva.com,2024-05-01T02:00:00+02:00,seed,[],stackcapitalgroup.com,1.0,US$8 million,8000000,https://www.globenewswire.com/news-release/202...
4,fidelity.com,2024-04-11T02:00:00+02:00,seed,[],chevychasetrust.com,1.0,$1.96 million,1960000,https://www.defenseworld.net/2024/04/11/chevy-...
5,swtchenergy.com,2024-04-24T02:00:00+02:00,series_b,"[""series_b"", ""venture""]","alantra.com, blueearth.capital",2.0,$27.2 Million,27200000,https://www.mercomindia.com/funding-and-ma-rou...
6,carnow.com,2024-04-16T02:00:00+02:00,seed,"[""debt_financing""]",runwaygrowth.com,1.0,$40 million,40000000,https://www.prnewswire.com/news-releases/runwa...
7,databricks.com,2024-08-07T02:00:00+02:00,series_i,"[""series_i"", ""venture""]",accelia.vc,1.846154,$685 million,685000000,https://iteuropa.com/news/large-language-model...
8,anthropic.com,2024-07-08T02:00:00+02:00,seed,[],damachotelsandresorts.com,1.0,$50mn,50000000,https://www.arabianbusiness.com/industries/tec...
9,ey.com,2024-04-18T02:00:00+02:00,seed,[],accelia.vc,1.846154,AU$10.7M,6865000,https://www.biometricupdate.com/202404/ey-secu...


Transformation

Label Encoding

In [69]:
df ['Website Domain']. unique()

array(['trafigura.com', 'zenobe.com', 'canva.com', 'fidelity.com',
       'swtchenergy.com', 'carnow.com', 'databricks.com', 'anthropic.com',
       'ey.com', 'openpipe.ai', 'syntetica.co', 'zf.com', 'sparelabs.com',
       'e-zinc.ca', 'biointelligence.com', 'claritisoftware.com',
       'heylist.com', 'qohash.com', 'topicflow.com', 'gaiia.com',
       'sinnstudio.com'], dtype=object)

In [70]:
df ['Financing Type Normalized'].unique()

array(['seed', 'series_b', 'series_i', 'series_a2', 'series_a'],
      dtype=object)

In [71]:
df ['Investors'].unique()

array(['accelia.vc',
       'avivainvestors.com, lloydsbankinggroup.com, santander.co.uk, swip.com, cibc.com, societegenerale.com, natwest.us, rabobank.com, mufg.jp',
       'stackcapitalgroup.com', 'chevychasetrust.com',
       'alantra.com, blueearth.capital', 'runwaygrowth.com',
       'damachotelsandresorts.com', 'surocap.com', 'eib.org',
       'vistaragrowth.com', 'edc.ca, desjardinscapital.com, fondsftq.com',
       'cibc.com', 'inovia.vc'], dtype=object)

In [72]:
df ['Categories'].unique()

array(['[]', '["private_equity"]', '["series_b", "venture"]',
       '["debt_financing"]', '["series_i", "venture"]',
       '["seed", "venture"]', '["series_a2", "venture"]',
       '["series_a", "venture"]', '["private_equity", "venture"]'],
      dtype=object)

In [76]:
from sklearn.preprocessing import LabelEncoder
label_encode = LabelEncoder()

In [77]:
# List of categorical columns to encode
categorical_cols = ['Website Domain', 'Financing Type Normalized', 'Investors', 'Categories']

# Apply Label encoding to each categorical column
for col in categorical_cols:
    df[col] = label_encode.fit_transform(df[col])

In [78]:
df

Unnamed: 0,Website Domain,Found At,Financing Type Normalized,Categories,Investors,Investors Count,Amount,Amount Normalized,Source Urls
0,18,2024-03-14T01:00:00+01:00,0,8,0,1.846154,$1.9b,1900000000,https://www.tradefinanceglobal.com/posts/trafi...
1,19,2024-05-31T02:00:00+02:00,0,8,2,9.0,$522.7 million,522700000,https://realassets.ipe.com/news/aviva-among-le...
2,19,2024-07-24T02:00:00+02:00,0,2,0,1.846154,£41.7m,53671000,https://www.innovationnewsnetwork.com/zenobe-a...
3,2,2024-05-01T02:00:00+02:00,0,8,10,1.0,US$8 million,8000000,https://www.globenewswire.com/news-release/202...
4,8,2024-04-11T02:00:00+02:00,0,8,3,1.0,$1.96 million,1960000,https://www.defenseworld.net/2024/04/11/chevy-...
5,15,2024-04-24T02:00:00+02:00,3,6,1,2.0,$27.2 Million,27200000,https://www.mercomindia.com/funding-and-ma-rou...
6,3,2024-04-16T02:00:00+02:00,0,0,9,1.0,$40 million,40000000,https://www.prnewswire.com/news-releases/runwa...
7,5,2024-08-07T02:00:00+02:00,4,7,0,1.846154,$685 million,685000000,https://iteuropa.com/news/large-language-model...
8,0,2024-07-08T02:00:00+02:00,0,8,5,1.0,$50mn,50000000,https://www.arabianbusiness.com/industries/tec...
9,7,2024-04-18T02:00:00+02:00,0,8,0,1.846154,AU$10.7M,6865000,https://www.biometricupdate.com/202404/ey-secu...


In [79]:
# Reconfirming the percentage of missing values in columns
missing_percentage = df.isnull().sum() / len(df) * 100
missing_percentage

Website Domain               0.0
Found At                     0.0
Financing Type Normalized    0.0
Categories                   0.0
Investors                    0.0
Investors Count              0.0
Amount                       0.0
Amount Normalized            0.0
Source Urls                  0.0
dtype: float64

## Standardization

In [81]:
df ['Investors Count']. min()

1.0

In [82]:
df ['Investors Count']. max()

9.0

In [93]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

In [94]:
# Apply MinMaxScaler to the Investor count column
df['Investors Count'] = scaler.fit_transform(df[['Investors Count']])
df

Unnamed: 0,Website Domain,Found At,Financing Type Normalized,Categories,Investors,Investors Count,Amount,Amount Normalized,Source Urls
0,18,2024-03-14T01:00:00+01:00,0,8,0,0.105769,$1.9b,1900000000,https://www.tradefinanceglobal.com/posts/trafi...
1,19,2024-05-31T02:00:00+02:00,0,8,2,1.0,$522.7 million,522700000,https://realassets.ipe.com/news/aviva-among-le...
2,19,2024-07-24T02:00:00+02:00,0,2,0,0.105769,£41.7m,53671000,https://www.innovationnewsnetwork.com/zenobe-a...
3,2,2024-05-01T02:00:00+02:00,0,8,10,0.0,US$8 million,8000000,https://www.globenewswire.com/news-release/202...
4,8,2024-04-11T02:00:00+02:00,0,8,3,0.0,$1.96 million,1960000,https://www.defenseworld.net/2024/04/11/chevy-...
5,15,2024-04-24T02:00:00+02:00,3,6,1,0.125,$27.2 Million,27200000,https://www.mercomindia.com/funding-and-ma-rou...
6,3,2024-04-16T02:00:00+02:00,0,0,9,0.0,$40 million,40000000,https://www.prnewswire.com/news-releases/runwa...
7,5,2024-08-07T02:00:00+02:00,4,7,0,0.105769,$685 million,685000000,https://iteuropa.com/news/large-language-model...
8,0,2024-07-08T02:00:00+02:00,0,8,5,0.0,$50mn,50000000,https://www.arabianbusiness.com/industries/tec...
9,7,2024-04-18T02:00:00+02:00,0,8,0,0.105769,AU$10.7M,6865000,https://www.biometricupdate.com/202404/ey-secu...


In [95]:
df ['Investors Count']. min()

0.0

In [96]:
df ['Investors Count']. max()

1.0

In [97]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26 entries, 0 to 25
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Website Domain             26 non-null     int64  
 1   Found At                   26 non-null     object 
 2   Financing Type Normalized  26 non-null     int64  
 3   Categories                 26 non-null     int64  
 4   Investors                  26 non-null     int64  
 5   Investors Count            26 non-null     float64
 6   Amount                     26 non-null     object 
 7   Amount Normalized          26 non-null     int64  
 8   Source Urls                26 non-null     object 
dtypes: float64(1), int64(5), object(3)
memory usage: 2.0+ KB
