# Akeneo Connector

The Akeneo Connector should be the basic access to the Akeneo-PIM data. It also implements data caches for efficiency. It returns cleaned and condensed values from Akeneo-PIM. They for the basis for the clustering.

In [1]:
import shutil

from IPython.display import display
import pandas as pd

import akeneo
import config

In [2]:
connector, client = akeneo.create_from_env()

## Categories and Families

These are human assigned clusters to the products. Families also specify which attributes are available for the products of that family.

In [3]:
funcs = [
    connector.get_categories,
    connector.get_families,
]

for func in funcs:
    df = pd.DataFrame(func())
    display(df)

Unnamed: 0,code,labels,parent
0,master,Master catalog,
1,tvs_projectors,TVs and projectors,master
2,pc_monitors,PC Monitors,tvs_projectors
3,led_tvs,LED TVs,tvs_projectors
4,cameras,Cameras,master
...,...,...,...
163,suppliers,Suppliers,
164,supplier_mongo,Mongo,suppliers
165,supplier_zaro,Zaro,suppliers
166,supplier_the_tootles,The Tootles,suppliers


Unnamed: 0,code,labels,attributes
0,accessories,Accessories,"[brand, collection, color, composition, descri..."
1,camcorders,Camcorders,"[description, image_stabilizer, name, optical_..."
2,clothing,Clothing,"[brand, care_instructions, collection, color, ..."
3,digital_cameras,Digital cameras,"[auto_exposure, auto_focus_assist_beam, auto_f..."
4,headphones,Headphones,"[description, headphone_connectivity, name, pi..."
5,laser_led_printers,Laser and LED printers,"[description, maximum_print_size, name, pictur..."
6,led_tvs,LED TVs,"[description, display_diagonal, name, picture,..."
7,loudspeakers,Loudspeakers,"[description, name, picture, power_requirement..."
8,mp3_players,MP3 players,"[description, name, picture, power_requirement..."
9,mugs,Mugs,"[container_material, description, main_color, ..."


## Attributes

The attributes offer constraints and additional meta data that may be used to process the product values before clustering.

In [4]:
res = client.get_list("pim_api_attribute_list")
df = pd.DataFrame(res)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 77 entries, 0 to 76
Data columns (total 29 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   code                    77 non-null     object
 1   type                    77 non-null     object
 2   group                   77 non-null     object
 3   unique                  77 non-null     bool  
 4   useable_as_grid_filter  77 non-null     bool  
 5   allowed_extensions      77 non-null     object
 6   metric_family           4 non-null      object
 7   default_metric_unit     4 non-null      object
 8   reference_data_name     0 non-null      object
 9   available_locales       77 non-null     object
 10  max_characters          0 non-null      object
 11  validation_rule         0 non-null      object
 12  validation_regexp       0 non-null      object
 13  wysiwyg_enabled         1 non-null      object
 14  number_min              0 non-null      object
 15  number_m

In [5]:
res = connector.get_attributes()
df = pd.DataFrame(res)
df

Unnamed: 0,code,type,labels,localizable,scopable,unique,group,group_labels,sort_order,allowed_extensions,...,max_file_size,metric_family,minimum_input_length,negative_allowed,number_min,number_max,reference_data_name,validation_rule,validation_regexp,wysiwyg_enabled
0,auto_exposure,AttributeType.BOOL,Auto exposure,False,False,False,technical,Technical,39,[],...,,,,,,,,,,
1,auto_focus_assist_beam,AttributeType.BOOL,Auto focus beam,False,False,False,technical,Technical,34,[],...,,,,,,,,,,
2,auto_focus_lock,AttributeType.BOOL,Auto focus lock,False,False,False,technical,Technical,33,[],...,,,,,,,,,,
3,auto_focus_modes,AttributeType.TEXT,Auto focus modes,True,False,False,technical,Technical,31,[],...,,,,,,,,,,
4,auto_focus_points,AttributeType.NUMBER,Auto focus points,False,False,False,technical,Technical,32,[],...,,,,False,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
72,variation_image,AttributeType.IMAGE,Variant picture,False,False,False,medias,Media,20,"[png, jpeg, jpg]",...,,,,,,,,,,
73,variation_name,AttributeType.TEXT,Variant Name,True,False,False,marketing,Marketing,4,[],...,,,,,,,,,,
74,viewing_area,AttributeType.METRIC,Effective viewing area,False,False,False,technical,Technical,20,[],...,,Length,,False,,,,,,
75,wash_temperature,AttributeType.SELECT_SINGLE,Wash temperature,False,False,False,product,Product,6,[],...,,,,,,,,,,


## Products

This is the most important structure. They have several meta data like categories they belong to and more.

Embedded within them are the product values. These are generically defined by the created attributes in Akeneo-PIM.

In [6]:
res = connector.get_products()
df = pd.DataFrame(res)
df

Unnamed: 0,identifier,enabled,family,categories,groups,created,updated,parent,values
0,1111111171,True,accessories,"[master_accessories_bags, print_accessories, s...",[],2022-01-12 12:08:58+00:00,2022-01-12 12:08:59+00:00,,{'image': 'f/e/9/6/fe960148cdaa746093eb734cd82...
1,13620748,True,led_tvs,"[led_tvs, samsung, tvs_projectors_sales]",[],2022-01-12 12:08:58+00:00,2022-01-12 12:08:59+00:00,,"{'name': 'Samsung UE40ES5500PXZT LED TV', 'dis..."
2,15554974,True,webcams,"[cameras_sales, logitech, webcams]",[],2022-01-12 12:08:58+00:00,2022-01-12 12:08:59+00:00,,"{'maximum_video_resolution': '640_x_480', 'nam..."
3,12249740,True,scanners,"[avision, print_scan_sales, scanners]",[],2022-01-12 12:09:22+00:00,2022-01-12 12:09:22+00:00,,{'picture': 'c/2/d/f/c2df1e701322f576b6e477653...
4,1111111130,True,clothing,"[master_men_blazers_deals, supplier_zaro]",[],2022-01-12 12:09:24+00:00,2022-01-12 12:09:24+00:00,apollon_yellow,"{'size': 'xs', 'color': 'yellow', 'supplier': ..."
...,...,...,...,...,...,...,...,...,...
1234,1111111125,True,clothing,"[master_men_blazers_deals, supplier_zaro]",[],2022-01-12 12:09:22+00:00,2022-01-12 12:09:22+00:00,apollon_pink,"{'size': 'xs', 'color': 'pink', 'supplier': 'z..."
1235,1111111126,True,clothing,"[master_men_blazers_deals, supplier_zaro]",[],2022-01-12 12:09:22+00:00,2022-01-12 12:09:22+00:00,apollon_red,"{'size': 'xxl', 'color': 'red', 'supplier': 'z..."
1236,1111111127,True,clothing,"[master_men_blazers_deals, supplier_zaro]",[],2022-01-12 12:09:22+00:00,2022-01-12 12:09:22+00:00,apollon_red,"{'size': 'm', 'color': 'red', 'supplier': 'zar..."
1237,1111111128,True,clothing,"[master_men_blazers_deals, supplier_zaro]",[],2022-01-12 12:09:22+00:00,2022-01-12 12:09:22+00:00,apollon_yellow,"{'size': 'm', 'color': 'yellow', 'supplier': '..."


## Storing Cleaned Data to CSV Files

Also exported the product values separately.

In [7]:
csv_dir = config.dir_data / "02-akeneo-data-cleaned"
shutil.rmtree(csv_dir, ignore_errors=True)
csv_dir.mkdir(parents=True, exist_ok=True)

def print_and_safe_data(name: str, data: list[dict]):
    df = pd.DataFrame(data)
    df.to_csv(csv_dir / f"{name}.csv", index=False)
    print(f"--- {name} --------------------------------------")
    display(df.head())

In [8]:
cases = [
    ("attributes", connector.get_attributes),
    ("categories", connector.get_categories),
    ("families", connector.get_families),
    ("products", connector.get_products),
]

for name, func in cases:
    data = func()
    print_and_safe_data(name, data)

    if name == "products":
        attr = connector.get_attributes()
        prods = connector.get_products()
        values = akeneo.Product.to_products_values(attr, prods)
        print_and_safe_data(f"{name}__values", values)

--- attributes --------------------------------------


Unnamed: 0,code,type,labels,localizable,scopable,unique,group,group_labels,sort_order,allowed_extensions,...,max_file_size,metric_family,minimum_input_length,negative_allowed,number_min,number_max,reference_data_name,validation_rule,validation_regexp,wysiwyg_enabled
0,auto_exposure,AttributeType.BOOL,Auto exposure,False,False,False,technical,Technical,39,[],...,,,,,,,,,,
1,auto_focus_assist_beam,AttributeType.BOOL,Auto focus beam,False,False,False,technical,Technical,34,[],...,,,,,,,,,,
2,auto_focus_lock,AttributeType.BOOL,Auto focus lock,False,False,False,technical,Technical,33,[],...,,,,,,,,,,
3,auto_focus_modes,AttributeType.TEXT,Auto focus modes,True,False,False,technical,Technical,31,[],...,,,,,,,,,,
4,auto_focus_points,AttributeType.NUMBER,Auto focus points,False,False,False,technical,Technical,32,[],...,,,,False,,,,,,


--- categories --------------------------------------


Unnamed: 0,code,labels,parent
0,master,Master catalog,
1,tvs_projectors,TVs and projectors,master
2,pc_monitors,PC Monitors,tvs_projectors
3,led_tvs,LED TVs,tvs_projectors
4,cameras,Cameras,master


--- families --------------------------------------


Unnamed: 0,code,labels,attributes
0,accessories,Accessories,"[brand, collection, color, composition, descri..."
1,camcorders,Camcorders,"[description, image_stabilizer, name, optical_..."
2,clothing,Clothing,"[brand, care_instructions, collection, color, ..."
3,digital_cameras,Digital cameras,"[auto_exposure, auto_focus_assist_beam, auto_f..."
4,headphones,Headphones,"[description, headphone_connectivity, name, pi..."


--- products --------------------------------------


Unnamed: 0,identifier,enabled,family,categories,groups,created,updated,parent,values
0,1111111171,True,accessories,"[master_accessories_bags, print_accessories, s...",[],2022-01-12 12:08:58+00:00,2022-01-12 12:08:59+00:00,,{'image': 'f/e/9/6/fe960148cdaa746093eb734cd82...
1,13620748,True,led_tvs,"[led_tvs, samsung, tvs_projectors_sales]",[],2022-01-12 12:08:58+00:00,2022-01-12 12:08:59+00:00,,"{'name': 'Samsung UE40ES5500PXZT LED TV', 'dis..."
2,15554974,True,webcams,"[cameras_sales, logitech, webcams]",[],2022-01-12 12:08:58+00:00,2022-01-12 12:08:59+00:00,,"{'maximum_video_resolution': '640_x_480', 'nam..."
3,12249740,True,scanners,"[avision, print_scan_sales, scanners]",[],2022-01-12 12:09:22+00:00,2022-01-12 12:09:22+00:00,,{'picture': 'c/2/d/f/c2df1e701322f576b6e477653...
4,1111111130,True,clothing,"[master_men_blazers_deals, supplier_zaro]",[],2022-01-12 12:09:24+00:00,2022-01-12 12:09:24+00:00,apollon_yellow,"{'size': 'xs', 'color': 'yellow', 'supplier': ..."


--- products__values --------------------------------------


Unnamed: 0,sku,image,ean,name,weight,display_diagonal,description,release_date,maximum_video_resolution,total_megapixels,...,power_requirements,eu_shoes_size,maximum_frame_rate,image_stabilizer,brand,material,meta_title,composition,care_instructions,main_color
0,1111111171,f/e/9/6/fe960148cdaa746093eb734cd82a36ffe9b6ff...,1234567890183.0,Bag,500000.0,,,NaT,,,...,,,,,,,,,,
1,13620748,,,Samsung UE40ES5500PXZT LED TV,,40.0,"Samsung UE40ES5500PXZT. HD type: Full HD, Disp...",2012-03-29 00:00:00+00:00,,,...,,,,,,,,,,
2,15554974,,,Logitech C170,,,<b>The easy way to start video calling and sen...,2012-09-13 00:00:00+00:00,640_x_480,5.0,...,,,,,,,,,,
3,12249740,,,Avision AV36,,,<b>Mobile power by USB</b>\nDon’t just haul th...,2012-01-05 00:00:00+00:00,,,...,,,,,,,,,,
4,1111111130,8/4/f/9/84f9f9f4a41331b349c54d2a67373fa1b7df1d...,1234567890142.0,Long gray suit jacket and matching pants unstr...,600000.0,,Long gray suit jacket and matching pants unstr...,NaT,,,...,,,,,,,,,,
