Issue of Loading Data #60

hytting · 2024-01-08T11:39:23Z

Hi there,

When I tried to load the train/val/test set csv file that I splitted with load_data_from_folder in multimodal_transformers.data, the returned train_dataset/val_dataset/test_dataset will give me a strange length, which is totally not related to the original length of the csv file.
for example, the train_df.shape = (105195,25), while the train_dataset.cat_feats.shape = (131495,38).

For spliting dataset, I tried train_test_split and np.split, but they both gave me the same issue with loading.

But if I followed the exact same code in the notebook for splitting datasets, load_data_from_folder would work well. At the same time, if I modify one column, such as match the number with words from [0,2,0...] to [A,B,A...], it also cannot load in the correct way.

Does anyone have any suggestions?

hytting · 2024-01-09T08:54:32Z

Hi I have found out the problem. When pip install multimodal-transformers, somehow the 0.11a0 version was installed instead of the latest one. In 0.11a0, there is a bug in the load_data.py file and it's updated in the newest version: train_df=data_df.iloc[:len_train].
(The old version use df.loc[train_df.index])

So I manually changed the py file and it works now.

hytting closed this as completed Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue of Loading Data #60

Issue of Loading Data #60

hytting commented Jan 8, 2024

hytting commented Jan 9, 2024

Issue of Loading Data #60

Issue of Loading Data #60

Comments

hytting commented Jan 8, 2024

hytting commented Jan 9, 2024