In [1]:
import pandas as pd

**Read the .csv database file as a pandas DataFrame** 

In [2]:
dataset_df = pd.read_csv("CH_Dataset_Database.csv", sep=" ")
print('Size of the DataFrame = ', dataset_df.shape)

Size of the DataFrame =  (104813, 10)


**To specify the maximum number of rows displayed in the DataFrame, change the display max_rows options. The default number of rows is 60.**

In [3]:
pd.options.display.max_rows = 10
display(dataset_df)

Unnamed: 0,Image_Number,Case,Coefficient_'b',lambda,Grid_Size,Seed,Cahn_Hilliard_Simulation_Index,Displacement_Result_File_Number,Displacement_Result_Index,Delta_Psi_Reaction_Force_Result_Index
0,1,1,70.0201,0.016564,37,290,1,1,0,0
1,2,1,70.0201,0.016564,37,290,1,1,1,1
2,3,1,70.0201,0.016564,37,290,1,1,2,2
3,4,1,70.0201,0.016564,37,290,1,1,3,3
4,5,1,70.0201,0.016564,37,290,1,1,4,4
...,...,...,...,...,...,...,...,...,...,...
104808,104809,3,99.9750,0.010000,41,355,2072,20,1485,29605
104809,104810,3,99.9750,0.010000,41,355,2072,20,1486,29606
104810,104811,3,99.9750,0.010000,41,355,2072,20,1487,29607
104811,104812,3,99.9750,0.010000,41,355,2072,20,1488,29608


**List the headers in the .csv file** 

In [4]:
print(list(dataset_df.columns))

['Image_Number', 'Case', "Coefficient_'b'", 'lambda', 'Grid_Size', 'Seed', 'Cahn_Hilliard_Simulation_Index', 'Displacement_Result_File_Number', 'Displacement_Result_Index', 'Delta_Psi_Reaction_Force_Result_Index']


- ''Image_Number'': the total image number for the images numbered incrementally from 1 to 104,813 
- ''Case'': the case number, 1, 2, or 3, in reference to the initial concentration for which the pattern was generated
- ''Coefficient_'b' '': the peak-to-valley value of the symmetric double-well chemical free-energy function
- ''lambda'': the parameter $\lambda$ related to the thickness of the interfaces between the two distinct phases
- ''Grid_Size'': the grid size on which the concentration was initialized
- ''Seed'': the seed value for which the uniform random variable r was generated
- ''Cahn_Hilliard_Simulation_Index'': the index of the simulation corresponding to the unique 2,072 Cahn-Hilliard simulations that were run
- ''Displacement_Result_File_Number'': the text file number 1 - 20 in which the displacement results in x and in y were saved
- ''Displacement_Result_Index'': the row index of the displacement results in the corresponding ''.txt'' file for **d = 0.5** for each pattern
- ''Delta_Psi_Reaction_Force_Result_Index'': the row index of both the change in strain energy results and the total reaction force in the x and y directions for **d = [0.0,0.001,0.1,0.2,0.3,0.4,0.5]** for each pattern.

**Example to find information on Image 3721**

In [5]:
 dataset_df[dataset_df['Image_Number'] == 3721]

Unnamed: 0,Image_Number,Case,Coefficient_'b',lambda,Grid_Size,Seed,Cahn_Hilliard_Simulation_Index,Displacement_Result_File_Number,Displacement_Result_Index,Delta_Psi_Reaction_Force_Result_Index
3720,3721,1,73.0,0.021101,47,548,77,2,1844,3720


**Example to get all images from a given Cahn-Hilliard simulation of the 2,072 simulations (1 $\rightarrow$ 2,072) as well as the simulation parameters**

In [6]:
 dataset_df.loc[dataset_df['Cahn_Hilliard_Simulation_Index'] == 50,['Image_Number','Case', "Coefficient_'b'", 'lambda', 'Grid_Size', 'Seed']]

Unnamed: 0,Image_Number,Case,Coefficient_'b',lambda,Grid_Size,Seed
2368,2369,1,71.9933,0.024591,11,203
2369,2370,1,71.9933,0.024591,11,203
2370,2371,1,71.9933,0.024591,11,203
2371,2372,1,71.9933,0.024591,11,203
2372,2373,1,71.9933,0.024591,11,203
...,...,...,...,...,...,...
2411,2412,1,71.9933,0.024591,11,203
2412,2413,1,71.9933,0.024591,11,203
2413,2414,1,71.9933,0.024591,11,203
2414,2415,1,71.9933,0.024591,11,203


**Example to get all images from the $50^{th}$ Cahn-Hilliard simulation and output them in a list**

In [7]:
img_numbers = dataset_df.loc[dataset_df['Cahn_Hilliard_Simulation_Index'] == 50,['Image_Number']].values.tolist()
list_img_numbers = [item for sublist in img_numbers for item in sublist]

print('Image numbers from the 50th simulation:',list_img_numbers)

Image numbers from the 50th simulation: [2369, 2370, 2371, 2372, 2373, 2374, 2375, 2376, 2377, 2378, 2379, 2380, 2381, 2382, 2383, 2384, 2385, 2386, 2387, 2388, 2389, 2390, 2391, 2392, 2393, 2394, 2395, 2396, 2397, 2398, 2399, 2400, 2401, 2402, 2403, 2404, 2405, 2406, 2407, 2408, 2409, 2410, 2411, 2412, 2413, 2414, 2415, 2416]


**Example to get simulation parameters from the $50^{th}$ Cahn-Hilliard simulation and output them in a list**

In [8]:
simulation_parameters = dataset_df.loc[dataset_df['Cahn_Hilliard_Simulation_Index'] == 50,['Case', "Coefficient_'b'", 'lambda', 'Grid_Size', 'Seed']].values.tolist()

simulation_parameters[0]

[1.0, 71.9933, 0.0245906, 11.0, 203.0]

**Example to find the file number where the displacement results are saved and the row indices in that file for a range of images (Image 2000 $\rightarrow$ Image 2010)**

In [9]:
dataset_df.loc[dataset_df['Image_Number'].isin(range(2000,2010)),['Image_Number','Displacement_Result_File_Number','Displacement_Result_Index']]

Unnamed: 0,Image_Number,Displacement_Result_File_Number,Displacement_Result_Index
1999,2000,2,123
2000,2001,2,124
2001,2002,2,125
2002,2003,2,126
2003,2004,2,127
2004,2005,2,128
2005,2006,2,129
2006,2007,2,130
2007,2008,2,131
2008,2009,2,132
