<a href="https://www.kaggle.com/code/mikedelong/animal-welfare-data-eda?scriptVersionId=144461754" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
from glob import glob
files = sorted(list(glob(pathname='/kaggle/input/animal-welfare/*.csv')))
files

In [None]:
import pandas as pd
from plotly.express import bar

for filename in files[:3]:
    df = pd.read_csv(filepath_or_buffer=filename, thousands=',').drop(columns=['Code', 'Year'])
    bar(data_frame=df,x=df.columns[0], y=df.columns[1], log_y=True,).show()

We use a log scale for the y axis because the quantities are of such different scale.

In [None]:
laying_df = pd.read_csv(filepath_or_buffer=files[3], thousands=',')
laying_df['more cage-free'] = laying_df['Number of cage-free hens'] > laying_df['Number of hens in cages']
laying_df.head()

In [None]:
from plotly.express import scatter
scatter(data_frame=laying_df, x='Number of hens in cages', y='Number of cage-free hens', color='Year', hover_name='Entity', log_x=True,
       symbol='more cage-free').update_layout(legend_orientation='h')

Cage-free egg production is not prevalent anywhere except for a few European countries.

In [None]:
from plotly.express import pie
pie_df = laying_df['Number of cage-free hens'].value_counts(bins=[0, 1, 10000000000]).reset_index()
pie_df['Number of cage-free hens'] = ['more than zero', 'zero']
pie(data_frame=pie_df, values='count', names='Number of cage-free hens', title='Entities with cage free hens', ).update_traces(hoverinfo='label+percent', textinfo='value')

There are eighteen countries in the dataset with no cage-free hens. That seems high.

In [None]:
bar(data_frame=laying_df['Year'].value_counts().to_frame().reset_index().sort_values(by='Year'),
    x='Year', y='count')

Some of our data is ten years old.

In [None]:
year_free_df = laying_df[['Year', 'Number of cage-free hens']].copy()
year_free_df['nonzero cage-free'] = year_free_df['Number of cage-free hens'] == 0
bar(data_frame=year_free_df.drop(columns='Number of cage-free hens').value_counts().to_frame().reset_index(), x='Year', y='count', color='nonzero cage-free')

It is not surprising that the countries with no cage-free hens are from the older surveys.

In [None]:
crustaceans_df = pd.read_csv(filepath_or_buffer=files[6], thousands=',')
crustaceans_df.head()

In [None]:
crustaceans_df['Year'].value_counts()

It looks like we can pick any year and get the same number of datapoints, so let's pick the most recent.

In [None]:
fish_df = pd.read_csv(filepath_or_buffer=files[7], thousands=',')
fish_df.head()

In [None]:
year = 2017
seafood_df = crustaceans_df[crustaceans_df['Year'] == year].drop(columns=['Code', 'Year']).merge(right=fish_df[fish_df['Year'] == year].drop(columns=['Code', 'Year']), on='Entity', how='inner')
seafood_df = seafood_df.drop(columns=[item for item in seafood_df.columns if 'bound' in item])
seafood_df['more fish?'] = seafood_df['Estimated number of farmed fish'] > seafood_df['Estimated number of farmed decapod crustaceans']
seafood_df.head()

In [None]:
scatter(data_frame=seafood_df, x='Estimated number of farmed fish', y='Estimated number of farmed decapod crustaceans', hover_name='Entity',
       log_x=True, log_y=True, color='more fish?',)

It's important to remember that crustaceans and fish tend to have rather different food yields and this is a count of animals, not food.