![monitor](monitor_pic.jpg)


My current computer monitor is getting a bit dated and my eyes have started to feel sore from looking at it all day. I want to buy a new computer monitor, but every time I get online to start searching I get overwhelmed by the number of monitors available. They come in a wide variety of sizes, widths, definitions, display types, port types, etc, and vary in price from under \\$100 to well over \\$1000. While I know that I want to upgrade my screen size to give me more working area, other options, like gaming features, are irrelevant to me. How on earth will I ever be able to decide which monitor is right for me?

This seems like a perfect opportunity to put some data analysis into action! For this project I am going to be acquiring feature data for a number of different computer monitors and running some product analysis across those features to help me make a data driven decision which monitor to purchase. 

### First Step: Get the data
Unfortunately, I was unable to find a downloadable preexisting dataset for computer monitors so I had to make the dataset myself. I looked through a few different websites that carried a wide variety of monitors from different manufacturers but the available data was limited to pretty basic specifications. For this project I wanted to use a dataset with a rich feature set in order to try to explore more subtle differences in features and try to tease out their value. LG is one of the brands that seem to be pretty appealing to me, and when I looked at their website I found that for each monitor they provided over 50 data points and it was in a format that wouldnt be too difficult to scrape. So, my first step of the first step is going to be building a set of functions to scrape, clean and format the data for one monitor into a dataframe. Then later I will use those functions across multiple monitors and compile them all into one dataset. 

In [177]:
import pandas as pd
from bs4 import BeautifulSoup
import requests

In [109]:
# downloading the selected spec tags for a given monitor from the LG website

results = requests.get('https://www.lg.com/us/monitors/lg-32ul500-w-4k-uhd-monitor')
content = results.content
parser = BeautifulSoup(content, 'lxml')

table = list(parser.find_all('div', class_="tech-spacs"))

# the result is a list with 10 elements
# each element is a string with spec data for one aspect of the monitor

table_type = type(table)
print(f"returned object is a {table_type} with {len(table)} elements ")

returned object is a <class 'list'> with 10 elements 


In [149]:
# cleaning the first element and turning it into a list

first_table = table[0].text.replace('\n', ' ')
first_table = first_table.replace('\t', ' ')
first_table = first_table.strip()
first_table_listed = first_table.rsplit('   ')

In [133]:
# cleaning up the list

q_list = [element for element in first_table_listed if element != '']
q_header = q_list[0]
q_body = q_list[1:]
f = lambda q: q.strip()
mapped_q = list(map(f, q_body))

In [152]:
#separating the list into category & value lists and recombining as a dictionary

category = [mapped_q[i] for i in range(len(mapped_q)) if i%2 == 0]
value = [mapped_q[i] for i in range(len(mapped_q)) if i%2 != 0]
monitor_dict = dict(zip(category, value))

In [154]:
# putting the values into a dataframe with the row representing the monitor and columns are its attributes

df = pd.DataFrame(data=monitor_dict, index=[0])
df

Unnamed: 0,Size,Display Type,Response Time,Refresh Rate,Display Resolution,Color Gamut (Typ.),Color Depth (Number of Colors),Pixel Pitch (mm),Aspect Ratio,Resolution,Brightness,Contrast Ratio,Viewing Angle,Surface Treatment
0,"32""",VA,4ms (GtG at Faster),60Hz,4K UHD,DCI-P3 95% (CIE1976),1.07B,0.181x 0.181 mm,16:9,3840 x 2160,300cd/m²,3000:1,"178˚(R/L), 178˚(U/D)",Anti-Glare


In [170]:
# building functions for the transformation of each returned scraped elements

def download_monitor_specs(URL):
    """Import monitor specs and parse into list of strings"""
    
    from bs4 import BeautifulSoup
    import requests
    results = requests.get(URL)
    content = results.content
    parser = BeautifulSoup(content, 'lxml')
    list_of_strings = list(parser.find_all('div', class_="tech-spacs"))
    return list_of_strings
    
def clean_element(list_of_strings):
    """Reformat list of HTML strings and return a list of lists"""
    
    listed_element_lists = []
    for string in list_of_strings:
        string = string.text.replace('\n', ' ')
        string = string.replace('\t', ' ')
        string = string.strip()
        listed_string = string.rsplit('   ')
        listed_element_lists.append(listed_string)
    return listed_element_lists

def clean_listed_element(listed_element_lists):
    """Remove blank lines and strip empty spaces on list elements"""
    
    cleaned_element_list = []
    for list_ in listed_element_lists:
        listed_element = [element for element in list_ if element != '']
        listed_element = listed_element[1:]
        f = lambda q: q.strip()
        listed_element = list(map(f, listed_element))
        cleaned_element_list.append(listed_element)
    return cleaned_element_list

def list_to_dict(cleaned_element_list):
    """Separate list into categories and values then combine into dictionary"""
    
    categories = []
    values = []
    for element in cleaned_element_list:
        category = [element[i] for i in range(len(element)) if i%2 == 0]
        categories += category
        value = [element[i] for i in range(len(element)) if i%2 != 0]
        values+=value
    
    monitor_dict = dict(zip(categories, values))
    return monitor_dict

def dict_to_df(monitor_dict):
    """Convert dictionary of monitor specs into pandas DataFrame object"""
    import pandas as pd
    df = pd.DataFrame(data=monitor_dict, index=[0])
    return df



In [None]:
url = 'https://www.lg.com/us/monitors/lg-32ul500-w-4k-uhd-monitor'
download_monitor_specs(URL)
clean_element(list_of_strings)
clean_listed_element(listed_element_lists)
list_to_dict(cleaned_element_list)
dict_to_df(monitor_dict)

In [174]:
# testing out the function pipeline so far

url = 'https://www.lg.com/us/monitors/lg-32ul500-w-4k-uhd-monitor'

df = dict_to_df(
    list_to_dict(
    clean_listed_element(
    clean_element(
    download_monitor_specs(url)
    ))))

df.shape

(1, 52)

In [176]:
df

Unnamed: 0,Size,Display Type,Response Time,Refresh Rate,Display Resolution,Color Gamut (Typ.),Color Depth (Number of Colors),Pixel Pitch (mm),Aspect Ratio,Resolution,...,Shipping Dimensions (WxHxD),With Stand Weight,Without Stand Weight,Shipping Weight,Display Position Adjustments,Wall Mount Size (mm),Display Port,2020 Model,Limited Warranty,UPC
0,"32""",VA,4ms (GtG at Faster),60Hz,4K UHD,DCI-P3 95% (CIE1976),1.07B,0.181x 0.181 mm,16:9,3840 x 2160,...,"32.5"" x 19.9"" x 8.9""",13.7 lbs,11.7 lbs,21.6 lbs,Tilt,100 x 100 mm,Yes,Yes,1 Year Parts and Labor,719192641761
