### ¿Cuán rápido puedes ordenar estos productos?

Sección de ofertas es un listado de productos en oferta ordenados por score de ML y reglas de negocio.

El score va de 0 a 1 donde 1 es el mejor valor posible y determina que tan bueno es un item_id.

Reglas de negocio:
* Domain_id no se puede repetir en 4 posiciones consecutivas.
* Vertical no se puede repetir en 1 posición consecutiva.
* De existir el id 641416750, debe estar en la posición 3 siendo esta regla más fuerte que las demás.
* De existir el id 22351223 debe estar en la posición 6 siendo esta regla más fuerte que las demás.
* Las posiciones 9, 10 y 11 deben tener sí o sí items de la categoría  "HOME&DECO" siendo esta regla más fuerte que la 1 y 2.
* Cumpliendo estas condiciones, el ordenamiento debe respetar un orden de mayor score a menor.

Diseñar algoritmo que devuelva el listado final ordenado de items. Debe escalar eficientemente con el número de ítems y contemplar casos en que no se pueda cumplir restricción. 

In [1]:
# Libraries
import time
import pandas as pd
pd.set_option('display.max_rows', None)

In [2]:
# Parameters
NAME_FILE = "ordenamiento.csv"
FILES_PATH = '../Files/'

In [3]:
# Open csv file and view it
df = pd.read_csv(FILES_PATH+NAME_FILE)
print("Number of rows: "+str(df.shape[0]))
df.head()

Number of rows: 5000


Unnamed: 0,item_id,vertical,category,domain,score
0,512208310,CPG,PETS FOOD,MLC-CATS_AND_DOGS_FOODS,0.0272
1,468513076,CE,ELECTRONICS,MLC-RANGES,0.9256
2,614337410,CE,ELECTRONICS,MLC-SMART_SPEAKERS,0.8304
3,634351318,APP & SPORTS,APPAREL,MLC-PANTS,0.056
4,528383704,ACC,VEHICULAR MULTIMEDIA,MLC-GPS,0.2334


### Create auxiliaries function to support this sorting algorithm.

In [4]:
def sort_dataframe(df: pd.DataFrame) -> pd.DataFrame:
    """Main function to create the sortered dataframe.

    Args:
        df (pd.DataFrame): Dataframe with the information about items and their scores.

    Returns:
        shuffled_df (pd.DataFrame): Dataframe sortered with the constraints of the question.
    """
    
    # Start measuring execution time
    start_time = time.time()

    # Sort the DataFrame by score in descending order
    df = df.sort_values('score', ascending=False)

    # Get a subset of the DataFrame where positions 9, 10, and 11 have items from the "HOME&DECOR" category
    homedeco_df = df[df['category'] == 'HOME&DECOR'].iloc[:3]

    # Remove the items from the "HOME&DECO" category from the original DataFrame
    df = df.drop(homedeco_df.index)

    # Shuffle the remaining items while respecting the consecutive domain and vertical rules
    shuffled_df = shuffle_dataframe(df)

    # Check if item_id 641416750 exists and move it to position 3
    if 641416750 in shuffled_df['item_id'].values:
        shuffled_df = move_item_to_position(shuffled_df, 641416750, 3)

    # Check if item_id 22351223 exists and move it to position 6
    if 22351223 in shuffled_df['item_id'].values:
        shuffled_df = move_item_to_position(shuffled_df, 22351223, 6)

    # Insert the items from the "HOME&DECOR" category at positions 9, 10, and 11
    shuffled_df = pd.concat([shuffled_df.iloc[:8], homedeco_df, shuffled_df.iloc[8:]])

    # Reset index
    shuffled_df.reset_index(drop=True, inplace=True)
    
    # End measuring execution time
    end_time = time.time()
    execution_time = end_time - start_time
    print("Execution time:", execution_time, "seconds")

    return shuffled_df

def move_item_to_position(df: pd.DataFrame, item_id: int, position:int) -> pd.DataFrame:
    """If needed this function takes a row with a certain item_id and move it to the position required.

    Args:
        df (pd.DataFrame): Dataframe used in the algorithm.
        item_id (int): item_id that has to be moved.
        position (int): Position where the item_id

    Returns:
        df (pd.DataFrame): Dataframe where the item_id had being moved.
    """
    # Find the index of the item_id
    idx = df[df['item_id'] == item_id].index[0]
    # Save the row
    df_idx = df[df['item_id'] == item_id]
    # Move the row to the desired position
    df = df.drop(idx)
    df = pd.concat([df.iloc[:position-1], df_idx.loc[idx:idx], df.iloc[position-1:]])
    return df


def shuffle_dataframe(df: pd.DataFrame) -> pd.DataFrame:
    """Executes the reorder of the dataframe with the rules of domains and verticals.

    Args:
        df (pd.DataFrame): Dataframe where the scores and items are stored.

    Returns:
        shuffled_df (pd.DataFrame): Reordered dataframe.
    """
    # Shuffle the DataFrame while respecting the consecutive domain and vertical rules
    shuffled_df = pd.DataFrame(columns=df.columns)
    domain_count = 0
    vertical_count = 0

    while len(df) > 0:
        # Filter rows based on the domain and vertical rules
        domain_mask = (df['domain'] != shuffled_df.iloc[-1]['domain']) if len(shuffled_df) > 0 else True
        vertical_mask = (df['vertical'] != shuffled_df.iloc[-1]['vertical']) if len(shuffled_df) > 0 else True
        if len(shuffled_df) == 0:
            filtered_df = df.copy()
        else:
            filtered_df = df.loc[domain_mask & vertical_mask]

        # If no rows match the rules, reset the counts and continue
        if len(filtered_df) == 0:
            domain_count = 0
            vertical_count = 0
            continue

        # Select the row with the highest score from the filtered DataFrame
        max_score_row = filtered_df.loc[[filtered_df['score'].idxmax()]]

        # Append the selected row to the shuffled DataFrame and remove it from the original DataFrame
        shuffled_df = shuffled_df.append(max_score_row)
        df = df.drop(max_score_row.index)

        # Update the counts
        if len(shuffled_df) > 1:
            if shuffled_df.iloc[-1]['domain'] != shuffled_df.iloc[-2]['domain']:
                domain_count = 1
            else:
                domain_count += 1

            if shuffled_df.iloc[-1]['vertical'] != shuffled_df.iloc[-2]['vertical']:
                vertical_count = 1
            else:
                vertical_count += 1

    # Return
    return shuffled_df


In [5]:
# Executes the algorithm
sortered_dataframe = sort_dataframe(df)
print(sortered_dataframe.shape[0])

Execution time: 14.224012851715088 seconds
5000


In [6]:
# Review of the sortered dataframe
sortered_dataframe

Unnamed: 0,item_id,vertical,category,domain,score
0,590602034,CE,ELECTRONICS,MLC-GAME_CONSOLES,0.9998
1,523534468,BEAUTY & HEALTH,PHARMACEUTICS,MLC-SURGICAL_AND_INDUSTRIAL_MASKS,0.9976
2,641416750,CE,MOBILE,MLC-CELLPHONES,0.9986
3,609438042,CE,ELECTRONICS,MLC-GAME_CONSOLES,0.9996
4,541283090,HOME & INDUSTRY,TOOLS AND CONSTRUCTION,MLC-ELECTRIC_DRILLS,0.9968
5,634352041,CE,MOBILE,MLC-CELLPHONES,0.9994
6,610865341,HOME & INDUSTRY,INDUSTRY,MLC-POINTS_OF_SALE_KITS,0.996
7,615879515,CE,MOBILE,MLC-CELLPHONES,0.9992
8,582188629,HOME & INDUSTRY,HOME&DECOR,MLC-INDOOR_CURTAINS_AND_BLINDS,0.9822
9,538407191,HOME & INDUSTRY,HOME&DECOR,MLC-HOME_LIGHTING_SUPPLIES,0.9804
