 # &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;  **Web Scraping Car Details from Cars24.com** 
 ## &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Project Report(Team-O)


### **Objective** :  

The main objective of this mini-project is to develop skills in web scraping by extracting and analyzing car details of **HONDA** from Cars24.com for different locations. The key details to be gathered include kilometers driven, year of manufacture, fuel type, transmission, and price. This project provides hands-on experience in web scraping, essential for data extraction from online sources. By focusing on different locations, it allows for comparative analysis, revealing regional differences in car attributes and pricing. Ultimately, the project enhances data cleaning and structuring skills, which are crucial for any data analysis task.

### **Tools and Libraries :**
- **Requests**:&nbsp;&nbsp;  For sending HTTP requests to retrieve HTML content from web pages.
- **BeautifulSoup**:&nbsp;&nbsp; For parsing HTML content and extracting relevant data elements.
- **Selenium**:&nbsp;&nbsp;     For handling dynamic web content loading and scrolling.
- **Pandas**:&nbsp;&nbsp;        For structuring, cleaning, and analyzing the extracted data.
- **Seaborn**:&nbsp;&nbsp;         For creating visualizations and analyzing the data.
- **Matplotlib**:&nbsp;&nbsp;         For plotting graphs and visualizing data.
- **webdriver_manager**:&nbsp;&nbsp;  For managing the installation of the Chrome WebDriver.

### **Methodology** :

To achieve the objective of extracting and analyzing car details from Cars24.com, we followed these steps:<br>

#### 1. <u> Setup Environment : </u>
- Installed necessary Python libraries such as `requests`, `BeautifulSoup`, `pandas`, `seaborn`, and `matplotlib`.   
- Configured Selenium WebDriver for dynamic web content extraction.                

#### 2. <u>URL Construction :</u>
- Developed a function to construct the appropriate URL based on the car brand and location. This included mapping location names to their corresponding location IDs used by Cars24.com.

#### 3. <u>Data Extraction :</u>
- Sent HTTP requests using the `requests` library to retrieve the HTML content of web pages containing car listings.
- Utilized Selenium WebDriver to handle dynamic content loading and scrolling for comprehensive data extraction.
- Parsed the HTML content using BeautifulSoup to locate and extract relevant car details such as name, kilometers driven, year of manufacture, fuel type, transmission, and price.

#### 4. <u>Data Cleaning :</u>
- Developed a function to clean and format the extracted data to ensure consistency and readability.
- Converted relevant fields to appropriate data types (e.g., integers for kilometers driven and year of manufacture, floats for price).

#### 5. <u>Data Analysis :</u>
- Conducted descriptive analysis to summarize the extracted data.
- Visualized key insights using `seaborn` and `matplotlib` to create bar plots, count plots, histograms, and scatter plots. 

## **Result Analysis:**





#### 1. <u>Average Price by Fuel Type</u>:

- **Petrol Cars:** The average price is around 5.54 lakhs.
- **CNG Cars:** The average price is approximately 5.05 lakhs.
- **Diesel Cars:** The average price is about 5.86 lakhs. 


**Observation:** &nbsp;
 Diesel cars have the highest average price, followed by petrol cars. CNG cars are the least expensive on average.

#### 2. <u>Average Price by Transmission</u>:

- **Manual Transmission Cars:** The average price is around 5.28 lakhs.
- **Automatic Transmission Cars:** The average price is approximately 6.56 lakhs.

**Observation:** &nbsp; Automatic transmission cars are significantly more expensive on average compared to manual transmission cars.

#### 3. <u>Year of Manufacture by Transmission</u>:

- **Manual Transmission Cars:** A majority of the cars are from earlier years, with a notable concentration around 2016 and 2018.
- **Automatic Transmission Cars:** There is a noticeable number of newer models, especially from 2018 onwards.

**Observation:** Manual transmission cars tend to be older models, while automatic transmission cars include a larger proportion of newer models.

#### 4. <u> Year of Manufacture by Fuel Type </u>:

 **1.Petrol Vehicles:**

- The majority of petrol vehicles were manufactured between 2012 and 2022.
-The distribution shows a peak in the years 2014, 2016, and 2018.
- There is a noticeable decline in the number of petrol vehicles manufactured after 2018.

**Observation:** The bulk of petrol vehicles were manufactured in 2014, 2016, and 2018, with these years showing the highest counts. This suggests a higher production or registration of petrol vehicles during these years. The decline after 2018 could indicate a shift in market preference or regulatory changes affecting petrol vehicle sales.

 **2.CNG Vehicles:**

- CNG vehicles have fewer entries overall.
- The distribution shows a higher concentration of CNG vehicles manufactured around 2018.
- There is minimal representation of CNG vehicles from other years.

**Observation:** The majority of CNG vehicles are concentrated around the year 2018, indicating a possible rise in popularity or incentives for CNG vehicles during this period. The lower representation in other years suggests limited production or adoption of CNG vehicles overall.

 **3.Diesel Vehicles:**

- Similar to CNG, there are fewer diesel vehicles.
- The distribution is more evenly spread out between the years 2014 and 2020, with a slight peak around 2020.

**Observation:** Diesel vehicles show a relatively even distribution between 2014 and 2020, with a slight increase around 2020. This even spread suggests a steady but limited demand or production for diesel vehicles across these years.

### 5.<u>Price Distribution:</u>

- The distribution of car prices peaks around 6 lakhs, with the majority of cars priced between 4 to 8 lakhs.
- There are fewer cars in the lower price range (2-3 lakhs) and the higher price range (above 10 lakhs).
- The distribution shows a right skew, indicating that while most cars are moderately priced, there are a few high-priced cars that stretch the tail to the right.

**Observation:** The price distribution is right-skewed, with the majority of vehicles priced between 4 to 8 lakhs, peaking at around 6 lakhs. This indicates that most vehicles in the dataset are mid-range in terms of price, with relatively few high-end vehicles above 8 lakhs. The long tail towards higher prices suggests that while expensive vehicles are less common, they are still present in the market.


#### 6.<u>Kilometers Driven Distribution :</u>
- The kilometers driven distribution shows a peak around 40,000 kilometers.
- Most vehicles have been driven between 20,000 and 80,000 kilometers.
- There is a right skew in the distribution, with some vehicles having been driven up to 120,000 kilometers, but these are less common

**Observation:** The distribution of kilometers driven shows a peak around 40,000 kilometers, with most vehicles having been driven between 20,000 and 80,000 kilometers. The right-skewed distribution indicates that while higher mileage vehicles (up to 120,000 kilometers) exist, they are less common. This suggests that the majority of vehicles in the dataset are moderately used, with relatively fewer vehicles having very high mileage.

#### 7.<u>Year of Manufacture Distribution :</u>

- The frequency of cars manufactured peaks around the years 2015, 2017, and 2019.
- There is a noticeable decline in the number of cars manufactured in the more recent years (2020-2023).
- This could suggest that fewer cars from recent years are available in the dataset, possibly due to factors like lower production numbers, higher retention by original owners, or market dynamics affecting the resale of newer cars.

**Observation:** The year of manufacture and kilometers driven are significant factors in determining the price of a car, with newer and less-driven cars generally fetching higher prices.

#### 8.<u>Price vs. Kilometers Driven :</u>

- There is a general trend where cars with lower kilometers driven tend to have higher prices, which is expected as lower mileage often indicates better condition.
- Despite the general trend, there is significant variability, with some cars having high prices despite high kilometers driven and vice versa. This suggests other factors also play a significant role in determining price, such as brand, model, condition, and year of manufacture.

**Observation:** There is considerable variability in prices for a given year of manufacture or mileage, indicating that other factors, possibly including brand, model, and condition, also influence car prices.

#### 9.<u>Price vs. Year of Manufacture :</u>

- There is a clear positive correlation between the year of manufacture and the price of the car. Newer cars generally command higher prices.
- The prices of cars from the year 2020 onwards show more variability, indicating a broader range of factors affecting the price in recent years.
- Older cars (pre-2015) tend to cluster in the lower price range (2-4 lakhs), suggesting depreciation over time.

**Observation:** The market for cars manufactured in recent years (2020-2023) appears to be more dynamic with greater price variability, which may be due to newer car models having a wider range of features and conditions, or possibly economic factors influencing car values in recent times.

## **Challenges Faced:**
#### <u>Dynamic Content Loading</u>:

- Many web pages, including Cars24.com, use JavaScript to load content dynamically as the user scrolls. This required the use of Selenium WebDriver to simulate scrolling and ensure all car listings were loaded before extraction.
#### <u>Data Consistency</u>:
- The format and presentation of car details varied across different listings. Standardizing the data, such as converting prices and kilometers driven to consistent formats, posed a challenge.
#### <u>Error Handling</u>:
- Various issues such as missing data, network errors, and unsupported locations needed robust error handling mechanisms to prevent the program from crashing and to ensure meaningful data extraction.
#### <u>Performance and Efficiency</u>:
- Extracting data from multiple locations involved significant processing time, particularly due to the need for dynamic scrolling and loading. This required optimizing the extraction process for better performance.
#### <u>Parsing Variability</u>:
- Differences in HTML structure across different car listings required careful parsing logic to correctly identify and extract the necessary details without missing or incorrectly interpreting any data.

## **Future Directions:**
- *Expansion:* &nbsp; &nbsp; Include more car brands and additional locations for a broader analysis.
- *Automation:*  &nbsp; &nbsp;Implement scheduled tasks for regular data updates.
- *Advanced Analysis:* &nbsp; &nbsp; Incorporate additional attributes and perform deeper statistical analysis.
- *User Interface:* &nbsp; &nbsp; Develop an interactive interface for users to query and visualize data.

## **Conclusion:**

This project provided valuable hands-on experience in web scraping, data cleaning, and data analysis. By extracting and analyzing car details from Cars24.com, we gained insights into the used car market across various locations in India. The project highlighted the importance of robust data extraction techniques, efficient data cleaning processes, and insightful data visualization methods.This project has provided valuable insights into the used car market in India through comprehensive web scraping and data analysis techniques. By extracting and analyzing car details from Cars24.com across various cities, we uncovered regional trends in car attributes and pricing. The analysis revealed significant variations in kilometers driven, year of manufacture, fuel type, transmission, and price, reflecting diverse market dynamics. These insights are crucial for potential buyers, sellers, and market analysts seeking to understand regional preferences, market trends, and pricing strategies in the used car industry. Moving forward, further refinement and expansion of this project could deepen its impact, offering more nuanced insights and broader applicability in the automotive market analysis domain.

## **Key Takeaways :**

- *Technical Skills:* &nbsp; &nbsp; Enhanced understanding of web scraping tools (Requests, BeautifulSoup, Selenium) and data analysis libraries (Pandas, Seaborn, Matplotlib).
- *Problem-Solving:* &nbsp; &nbsp; Developed problem-solving skills to address challenges related to dynamic content loading, data consistency, and performance optimization.
- *Data Insights:* &nbsp; &nbsp; Gained practical experience in extracting, cleaning, and analyzing real-world data, leading to meaningful insights into the used car market.
- *Future Directions:* &nbsp; &nbsp; Identified potential areas for expanding the project, such as increasing brand and location coverage, automating data updates, and developing a user-friendly interface for custom data queries.