## Adjusting the format

As I was not satisfied with the original format of the column I reformatted it so that it would better represent the archival hierarchy and it would also be shorter for space issues.

I again transformed the Excel file in a DataFrame and then went onto uniting column together in three main ones: 

1) Archival description (the archival hierarchy); in this the hierarchy was represented by a specific order plus each category was separated by '/').
2) Internal description (the description of the archive content) (in this case no specific order was created and each category was shown one after the other).
3) External description (the description of the item) (it was displayed as above).

In [11]:
import pandas as pd


file_path = '/Users/martinapensalfini/Desktop/gadda/df_date_gadda_tipo.xlsx'


df = pd.read_excel(file_path)

print(df.head())


   N           archivio        fondo  \
0  0  Archivio Bonsanti  Fondo Gadda   
1  1  Archivio Bonsanti  Fondo Gadda   
2  2  Archivio Bonsanti  Fondo Gadda   
3  3  Archivio Bonsanti  Fondo Gadda   
4  4  Archivio Bonsanti  Fondo Gadda   

                                               unità serie    sottoserie  \
0                           «Piccola antologia 1904»    II  1 Quaderni 1   
1   «1913. – Gaddus» [appunti di analisi matematica]    II  1 Quaderni 1   
2          «Giornale di campagna vol. 2°. 1915-1916»    II  1 Quaderni 1   
3  [Quaderno di studio del tedesco 10 novembre 1917]    II  1 Quaderni 1   
4  [Quaderno di studio del tedesco 24 aprile 1918...    II  1 Quaderni 1   

   faldone                                              opera  \
0      NaN  Appunti giovanili, scolastici e universitari (...   
1      NaN  Appunti giovanili, scolastici e universitari (...   
2      NaN                                  Quaderni militari   
3      NaN                     Quaderni di

In [12]:
import pandas as pd



# Specify the columns to merge for the original group, first group, and second group
columns_to_merge_original_group = ['archivio', 'fondo', 'serie', 'sottoserie', 'faldone', 'unità']
columns_to_merge_first_group = ['opera', 'schede tematiche', 'descrizione', 'tipo', 'luogo', 'data', 'lib']
columns_to_merge_second_group = ['forma', 'supporto', 'contenitore', 'provenienza', 'lingua',
                                 'stato di conservazione', 'sommario', 'collana']

# Define a function to merge columns in the specified format (with column names)
def merge_original_group(row, columns):
    merged_values = []
    for column_name in columns:
        if pd.notnull(row[column_name]):
            value = row[column_name]
            if isinstance(value, str):  # Check if the value is a string
                merged_values.append(f"{column_name}: {value}")
            else:  # If the value is not a string, convert it to a string
                merged_values.append(f"{column_name}: {str(value)}")
    return ' / '.join(merged_values)

# Merge and format columns in the original group
df['Merged_Column_Original_Group'] = df.apply(lambda x: merge_original_group(x, columns_to_merge_original_group), axis=1)


# Define a function to merge columns in the specified format (with column names) and separate with a new line
def merge_with_different_format(row, columns):
    merged_values = []
    for column_name, value in row.iteritems():
        if pd.notnull(value):
            if column_name in columns:
                merged_values.append(f"{column_name}: {value}")
    return '\n'.join(merged_values)  # Separate with new line

# Merge the columns from the first group into a new column with the specified format
df['Merged_Column_First_Group'] = df.apply(lambda x: merge_with_different_format(x, columns_to_merge_first_group), axis=1)

# Merge the columns from the second group into another new column with the specified format
df['Merged_Column_Second_Group'] = df.apply(lambda x: merge_with_different_format(x, columns_to_merge_second_group), axis=1)

# Create a new DataFrame containing only the required merged columns
new_df_with_merged_columns = df[['Merged_Column_Original_Group', 'Merged_Column_First_Group', 'Merged_Column_Second_Group']].copy()

# Replace 'lib' with 'library' in the 'Merged_Column_First_Group' column
new_df_with_merged_columns['Merged_Column_First_Group'] = new_df_with_merged_columns['Merged_Column_First_Group'].str.replace('lib', 'library')

print(new_df_with_merged_columns)


                           Merged_Column_Original_Group  \
0     archivio: Archivio Bonsanti / fondo: Fondo Gad...   
1     archivio: Archivio Bonsanti / fondo: Fondo Gad...   
2     archivio: Archivio Bonsanti / fondo: Fondo Gad...   
3     archivio: Archivio Bonsanti / fondo: Fondo Gad...   
4     archivio: Archivio Bonsanti / fondo: Fondo Gad...   
...                                                 ...   
1578  archivio: Archivio Biblioteca Trivulziana / fo...   
1579  archivio: Archivio Biblioteca Trivulziana / fo...   
1580  archivio: Archivio Biblioteca Trivulziana / fo...   
1581  archivio: Archivio Biblioteca Trivulziana / fo...   
1582  archivio: Archivio Biblioteca Trivulziana / fo...   

                              Merged_Column_First_Group  \
0     opera: Appunti giovanili, scolastici e univers...   
1     opera: Appunti giovanili, scolastici e univers...   
2     opera: Quaderni militari\nschede tematiche: Ap...   
3     opera: Quaderni di studio del tedesco\nschede ...

  for column_name, value in row.iteritems():
  for column_name, value in row.iteritems():


I renamed the columns accordingly and then I saved them in both Excel and CSV format.

In [13]:
import pandas as pd

# Assuming you have already loaded the Excel file into a DataFrame named 'df' with merged columns

# Renaming columns in the merged DataFrame
new_column_names = {
    'Merged_Column_Original_Group': 'Archival Description',
    'Merged_Column_First_Group': 'Internal Description',
    'Merged_Column_Second_Group': 'External Description',
    # Add other columns to rename in a similar fashion
}

# Apply the rename operation to the DataFrame
df_with_renamed_columns = new_df_with_merged_columns.rename(columns=new_column_names)

# Display the DataFrame with renamed columns
print(df_with_renamed_columns)


                                   Archival Description  \
0     archivio: Archivio Bonsanti / fondo: Fondo Gad...   
1     archivio: Archivio Bonsanti / fondo: Fondo Gad...   
2     archivio: Archivio Bonsanti / fondo: Fondo Gad...   
3     archivio: Archivio Bonsanti / fondo: Fondo Gad...   
4     archivio: Archivio Bonsanti / fondo: Fondo Gad...   
...                                                 ...   
1578  archivio: Archivio Biblioteca Trivulziana / fo...   
1579  archivio: Archivio Biblioteca Trivulziana / fo...   
1580  archivio: Archivio Biblioteca Trivulziana / fo...   
1581  archivio: Archivio Biblioteca Trivulziana / fo...   
1582  archivio: Archivio Biblioteca Trivulziana / fo...   

                                   Internal Description  \
0     opera: Appunti giovanili, scolastici e univers...   
1     opera: Appunti giovanili, scolastici e univers...   
2     opera: Quaderni militari\nschede tematiche: Ap...   
3     opera: Quaderni di studio del tedesco\nschede ...

In [14]:
df_with_renamed_columns.to_csv("provamergecolumntipo.csv")

In [15]:
df_with_renamed_columns.to_excel("provamergecolumntipo.xlsx")