# Exploratory Data Analysis

Data set selected is called "Wood Prices Dataset" and was obtained from Kaggle at https://www.kaggle.com/datasets/swarajkhan/wood-prices-dataset.

**About:** This comprehensive dataset offers a detailed exploration of wood types from different regions, showcasing their market prices, popularity ratings, and various attributes. With over a thousand entries, this dataset is a valuable resource for understanding the dynamics of the international wood market, identifying trends in pricing, and uncovering factors that influence wood popularity. Whether you're a researcher, analyst, or enthusiast, this dataset provides a wealth of information for studying the global wood industry.

In [30]:
import numpy as np
import pandas as pd

In [39]:
# Load and view dataset.
df = pd.read_csv('./Data/wood_prices.csv')
print(f'Columns = {list(df.columns)}.')
df

Columns = ['Wood Type', 'Country', 'Price (USD)', 'Supply Source', 'Quality Rating', 'Popularity', 'Demand Level', 'Availability'].


Unnamed: 0,Wood Type,Country,Price (USD),Supply Source,Quality Rating,Popularity,Demand Level,Availability
0,Maple,China,166.59,Imported,Medium,3,Medium,Moderate
1,Rosewood,South Africa,144.71,Local,Low,6,Medium,Moderate
2,Rosewood,South Africa,216.92,Local,Medium,2,Medium,Abundant
3,Oak,Russia,130.13,Imported,Medium,5,Low,Abundant
4,Bamboo,Australia,114.66,Local,Medium,8,Low,Abundant
...,...,...,...,...,...,...,...,...
995,Rosewood,Indonesia,151.81,Local,Medium,2,Medium,Limited
996,Pine,Germany,230.97,Imported,Low,4,Low,Abundant
997,Maple,Indonesia,200.87,Local,Low,4,High,Limited
998,Maple,South Africa,227.75,Imported,Low,8,High,Moderate


In [37]:
# View possible values for each column.
for column in df.columns:
    print(f'\nColumn "{column}":')
    vc = df[column].value_counts()
    print(vc)
    print(f'Count = {len(vc.keys())}')


Column "Wood Type":
Wood Type
Rosewood    134
Teak        133
Oak         130
Maple       128
Mahogany    124
Pine        121
Cedar       118
Bamboo      112
Name: count, dtype: int64
Count = 8

Column "Country":
Country
Brazil          125
Australia       110
Germany         110
Russia          108
China            99
India            98
Indonesia        96
USA              95
Canada           82
South Africa     77
Name: count, dtype: int64
Count = 10

Column "Price (USD)":
Price (USD)
154.50    3
148.61    3
89.75     2
249.90    2
199.30    2
         ..
85.45     1
195.80    1
129.52    1
119.82    1
122.71    1
Name: count, Length: 971, dtype: int64
Count = 971

Column "Supply Source":
Supply Source
Imported    512
Local       488
Name: count, dtype: int64
Count = 2

Column "Quality Rating":
Quality Rating
Medium    366
High      321
Low       313
Name: count, dtype: int64
Count = 3

Column "Popularity":
Popularity
8     114
3     104
10    104
4     104
6     103
5     100
2   

<font color='#8888ff'>OBSERVATION:</font> This dataset has 8 columns and 1000 rows.

|Attribute|Type|Meaning|
|---|---|---|
|Wood Type|Nominal Categorical Qualitative|Name of type of wood.|
|Country|Nominal Categorical Qualitative|Name of country.|
|Price (USD)|Interval/Ratio Continuous Quantitative|Market price of wood in US Dollars.|
|Supply Source|Ordinal Categorical Qualitative|Whether wood is imported/locally sourced.|
|Quality Rating|Ordinal Categorical Qualitative|Quality if wood (low, medium, high).|
|Demand Level|Ordinal Categorical Qualitative|Demand for wood (low, medium, high).|
|Availability|Ordinal Categorical Qualitative|How easily/plentifully is the wood available (limited, moderate, abundant)?|
|Popularity|Discrete Quantitative|How popular the wood is [1,10]?|

In [60]:
df.drop(['Price (USD)'], axis=1).drop_duplicates()['Country'].value_counts()

Country
Brazil          123
Australia       108
Germany         108
Russia          106
China            99
India            97
USA              95
Indonesia        94
Canada           81
South Africa     76
Name: count, dtype: int64

In [59]:
df[df['Country'] == 'Russia']['Wood Type'].value_counts()

Wood Type
Rosewood    19
Mahogany    17
Bamboo      15
Teak        14
Pine        14
Cedar       12
Oak         10
Maple        7
Name: count, dtype: int64

In [68]:
df[
    (df['Wood Type'] == 'Bamboo')
    & (df['Country'] == 'China')
].drop(['Price (USD)'], axis=1).drop_duplicates()

Unnamed: 0,Wood Type,Country,Supply Source,Quality Rating,Popularity,Demand Level,Availability
60,Bamboo,China,Local,Medium,10,Medium,Limited
81,Bamboo,China,Imported,Medium,3,High,Moderate
215,Bamboo,China,Local,Medium,10,High,Limited
379,Bamboo,China,Local,High,8,Low,Abundant
422,Bamboo,China,Local,Low,9,Low,Limited
542,Bamboo,China,Imported,Medium,10,Low,Abundant
585,Bamboo,China,Imported,Medium,8,High,Limited
612,Bamboo,China,Imported,Low,6,High,Abundant
837,Bamboo,China,Local,Low,4,Medium,Limited
899,Bamboo,China,Imported,Low,8,Medium,Limited


In [11]:
df['Country'].value_counts()

Country
Brazil          125
Australia       110
Germany         110
Russia          108
China            99
India            98
Indonesia        96
USA              95
Canada           82
South Africa     77
Name: count, dtype: int64

In [21]:
df.columns

Index(['Wood Type', 'Country', 'Price (USD)', 'Supply Source',
       'Quality Rating', 'Popularity', 'Demand Level', 'Availability'],
      dtype='object')

In [27]:
df.drop(['Price (USD)'], axis=1).value_counts()

Wood Type  Country       Supply Source  Quality Rating  Popularity  Demand Level  Availability
Cedar      Indonesia     Local          Low             2           Low           Limited         2
           Australia     Imported       Low             6           Medium        Abundant        2
Oak        India         Local          Low             1           Medium        Limited         2
Mahogany   Russia        Local          Low             1           High          Moderate        2
Rosewood   Russia        Imported       Low             8           Low           Limited         2
                                                                                                 ..
Mahogany   South Africa  Local          Medium          10          Low           Abundant        1
           USA           Imported       Low             1           Low           Moderate        1
                                                        8           Medium        Limited         1
     

In [31]:
df[np.logical_and(df['Country'] == 'India', df['Wood Type'] == 'Maple')]

Unnamed: 0,Wood Type,Country,Price (USD),Supply Source,Quality Rating,Popularity,Demand Level,Availability
5,Maple,India,149.07,Imported,Low,6,Low,Moderate
42,Maple,India,121.48,Imported,Low,9,Low,Limited
73,Maple,India,220.31,Local,Medium,3,Low,Moderate
146,Maple,India,222.8,Local,Medium,7,High,Abundant
333,Maple,India,170.23,Local,High,8,Medium,Limited
426,Maple,India,126.93,Local,High,3,Low,Moderate
448,Maple,India,135.16,Imported,High,5,Medium,Moderate
602,Maple,India,83.9,Imported,High,3,High,Abundant
688,Maple,India,150.41,Imported,Low,2,Low,Abundant
720,Maple,India,159.59,Local,Low,1,High,Limited
