# Data Processing
This project features a dashboard that analyzes client details extracted from LinkedIn profiles and combines them with company revenue data to provide insights into client demographics and financial performance. The data processing uses real data from LinkedIn to extract required information for analysis. Since LinkedIn only provides street addresses, the OpenCage API is utilized to obtain corresponding country information. Note that the dashboard is demonstrated using a random dataset to ensure privacy and compliance.

In [2]:
import pandas as pd

In [3]:
df = pd.read_csv('Company_data.csv')

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119 entries, 0 to 118
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Company              119 non-null    object 
 1   Revenue              119 non-null    float64
 2   Name                 95 non-null     object 
 3   Company Location     87 non-null     object 
 4   Industry             95 non-null     object 
 5   Number Of Employees  95 non-null     object 
 6   Followers            95 non-null     object 
 7   URL                  95 non-null     object 
dtypes: float64(1), object(7)
memory usage: 7.6+ KB


In [17]:
#pip install geopy


In [14]:
pip install opencage


Defaulting to user installation because normal site-packages is not writeable
Collecting opencage
  Downloading opencage-2.4.0-py3-none-any.whl (16 kB)
Collecting backoff>=2.2.1
  Downloading backoff-2.2.1-py3-none-any.whl (15 kB)
Installing collected packages: backoff, opencage
Successfully installed backoff-2.2.1 opencage-2.4.0
Note: you may need to restart the kernel to use updated packages.


In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119 entries, 0 to 118
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Company              119 non-null    object 
 1   Revenue              119 non-null    float64
 2   Name                 95 non-null     object 
 3   Company Location     87 non-null     object 
 4   Industry             95 non-null     object 
 5   Number Of Employees  95 non-null     object 
 6   Followers            95 non-null     object 
 7   URL                  95 non-null     object 
 8   country              87 non-null     object 
dtypes: float64(1), object(8)
memory usage: 8.5+ KB


In [4]:
import re
# Function to parse the employee range and return an approximate number
def parse_employees(value):
    if pd.isna(value):  # Check for None or NaN
        return None

    # Match patterns like '1K-5K', '501-1K', '10K+', '51-200'
    match = re.match(r'(\d+)(K?)\s*-\s*(\d+)?(K?)|(\d+)(K?)\+', value)

    if match:
        low, low_k, high, high_k, single, single_k = match.groups()

        if single:  # Handle '10K+' case
            return int(single) * (1000 if single_k else 1)

        low_value = int(low) * (1000 if low_k else 1)
        high_value = int(high or 0) * (1000 if high_k else 1)

        # Return the higher end of the range as an estimate
        return max(low_value, high_value)

    # If no pattern matched, return None
    return None

# Apply the parsing function
df['mapped_employees'] = df['Number Of Employees'].apply(parse_employees)



In [5]:

# Function to parse the follower count
def parse_followers(value):
    if pd.isna(value):  # Check for None or NaN
        return None

    # Match patterns like '3K followers', '384 followers', '10K followers'
    match = re.match(r'(\d+(\.\d+)?)\s*(K|M)?\s*followers', value)

    if match:
        number, _, scale = match.groups()

        number = float(number)  # Convert number to float for cases like '1.5K'
        
        if scale == 'K':
            return int(number * 1000)
        elif scale == 'M':
            return int(number * 1000000)
        else:
            return int(number)

    # If no pattern matched, return None
    return None

# Apply the parsing function
df['mapped_followers'] = df['Followers'].apply(parse_followers)

In [6]:
# Mapping dictionary
industry_mapping = {
    'Advertising Services': 'Advertising and Marketing',
    'Business Content': 'Advertising and Marketing',
    
    'Airlines and Aviation': 'Aviation and Transportation',
    'Maritime': 'Aviation and Transportation',
    'Transportation, Logistics, Supply Chain and Storage': 'Aviation and Transportation',
    'Truck Transportation': 'Aviation and Transportation',
    
    'Appliances, Electrical, and Electronics Manufacturing': 'Consumer Goods and Retail',
    'Automotive': 'Consumer Goods and Retail',
    'Consumer Goods': 'Consumer Goods and Retail',
    'Retail': 'Consumer Goods and Retail',
    'Retail Apparel and Fashion': 'Consumer Goods and Retail',
    'Wholesale': 'Consumer Goods and Retail',
    'Wholesale Building Materials': 'Consumer Goods and Retail',
    
    'Education': 'Education and Training',
    'Wellness and Fitness Services': 'Education and Training',
    
    'Health, Wellness & Fitness': 'Healthcare and Medical',
    'Hospitals and Health Care': 'Healthcare and Medical',
    'Medical Equipment Manufacturing': 'Healthcare and Medical',
    'Medical Practices': 'Healthcare and Medical',
    
    'Food and Beverage Services': 'Hospitality and Travel',
    'Hospitality': 'Hospitality and Travel',
    'Travel Arrangements': 'Hospitality and Travel',
    
    'Computer Hardware Manufacturing': 'IT and Technology',
    'IT Services and IT Consulting': 'IT and Technology',
    'Software Development': 'IT and Technology',
    'Technology, Information and Internet': 'IT and Technology',
    
    'Engines and Power Transmission Equipment Manufacturing': 'Manufacturing',
    'Industrial Machinery Manufacturing': 'Manufacturing',
    'Machinery Manufacturing': 'Manufacturing',
    'Manufacturing': 'Manufacturing',
    'Textile Manufacturing': 'Manufacturing',
    
    'Book and Periodical Publishing': 'Media and Publishing',
    'Newspaper Publishing': 'Media and Publishing',
    'Performing Arts': 'Media and Publishing',
    
    'Business Consulting and Services': 'Professional Services',
    'Law Practice': 'Professional Services',
    'Legal Services': 'Professional Services',
    'Staffing and Recruiting': 'Professional Services',
    
    'Real Estate': 'Real Estate',
    'Real Estate Agents and Brokers': 'Real Estate',
    
    'Government Administration': 'Non-Profit and Government',
    'Non-profit Organizations': 'Non-Profit and Government',
    
    'Entertainment Providers': 'Entertainment'
}

# Map the values
df['Category'] = df['Industry'].map(industry_mapping)

In [33]:
df.to_csv('updated_dataset_with_countries.csv', index=False)