Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to retrieve original file path names in the results #4

Closed
Sid01123 opened this issue Jun 6, 2022 · 4 comments
Closed

Trying to retrieve original file path names in the results #4

Sid01123 opened this issue Jun 6, 2022 · 4 comments

Comments

@Sid01123
Copy link

Sid01123 commented Jun 6, 2022

Is there a way to be able to get the original pathnames of images used post fit_transform?

I am uploading images onto google colab, and reading them in by their filepaths as "/content/name_of_image", and then I wish to be able to recover this "/content/name_of_image" post running clustering.

I tried to extract pathnames per label using the following code, but seemed to be getting the filepaths for images created in a temporary directory as follows:

CODE
Iloc = cl.results['labels']==0
cl.results['pathnames'][Iloc]

OUTPUT
array(['/tmp/clustimage/8732cb41-c72d-4266-b164-ff453d68428a.png',
'/tmp/clustimage/440fecd8-8a9c-49a0-b100-ccfb66107425.png',
'/tmp/clustimage/3c9c38d8-4da9-4e4f-9130-d3836182b8c6.png',
'/tmp/clustimage/85cc4848-1faf-44ea-ae4c-9d9d88bd6323.png',
'/tmp/clustimage/6127e4fb-1c25-4ba9-8d68-56ef482e3db4.png',
'/tmp/clustimage/abcf85e0-af1a-48f1-8861-122122b64e32.png',
'/tmp/clustimage/275bbde0-394d-4ba4-b4d0-1c67da323c8b.png',
'/tmp/clustimage/30b62285-2628-45c0-86b2-fea305cb8db3.png',
'/tmp/clustimage/c47a6867-3c8f-480c-a7bd-b3e7ec4ba334.png',
'/tmp/clustimage/da5c17fc-de2a-4375-b03c-066a0904428a.png'], dtype='<U56')

I wish to get the output as the original filenames that were in the pathnames list.

@erdogant
Copy link
Owner

erdogant commented Jun 9, 2022

Can you show with an example how this occurs?
When I try the flowers example, it stores the filenames and paths correctly.
The unique identifiers are only used if a data matrix is given as an input.

from clustimage import Clustimage
cl = Clustimage(method='pca', embedding
g='umap')
# Import data
Xlist = cl.import_example(data='flowers')
# Import data in a standardized manner
X = cl.import_data(Xlist)

X.keys()
dict_keys(['img', 'feat', 'xycoord', 'pathnames', 'labels', 'url', 'filenames'])
print(X['filenames'][0:5])
# array(['0001.png', '0002.png', '0003.png', '0004.png', '0005.png'],


What I can do for the datamatrix, is use the index names of a pandas dataframe for naming. In that way you can control the naming as you wish.

@erdogant
Copy link
Owner

erdogant commented Jun 9, 2022

I added this functionality the functionality to read pandas dataframes.
Update with: pip install -U clustimage

Example:

from clustimage import Clustimage
import pandas as pd
import numpy as np

# Initialize
cl = Clustimage()

# Import data
Xraw = cl.import_example(data='mnist')

print(Xraw)
# array([[ 0.,  0.,  5., ...,  0.,  0.,  0.],
#        [ 0.,  0.,  0., ..., 10.,  0.,  0.],
#        [ 0.,  0.,  0., ..., 16.,  9.,  0.],
#        ...,
#        [ 0.,  0.,  1., ...,  6.,  0.,  0.],
#        [ 0.,  0.,  2., ..., 12.,  0.,  0.],
#        [ 0.,  0., 10., ..., 12.,  1.,  0.]])

filenames = list(map(lambda x: str(x) + '.png', np.arange(0, Xraw.shape[0])))
Xraw = pd.DataFrame(Xraw, index=filenames)

print(Xraw)
#            0    1     2     3     4     5   ...   58    59    60    61   62   63
# 0.png     0.0  0.0   5.0  13.0   9.0   1.0  ...  6.0  13.0  10.0   0.0  0.0  0.0
# 1.png     0.0  0.0   0.0  12.0  13.0   5.0  ...  0.0  11.0  16.0  10.0  0.0  0.0
# 2.png     0.0  0.0   0.0   4.0  15.0  12.0  ...  0.0   3.0  11.0  16.0  9.0  0.0
# 3.png     0.0  0.0   7.0  15.0  13.0   1.0  ...  7.0  13.0  13.0   9.0  0.0  0.0
# 4.png     0.0  0.0   0.0   1.0  11.0   0.0  ...  0.0   2.0  16.0   4.0  0.0  0.0
#       ...  ...   ...   ...   ...   ...  ...  ...   ...   ...   ...  ...  ...
# 1792.png  0.0  0.0   4.0  10.0  13.0   6.0  ...  2.0  14.0  15.0   9.0  0.0  0.0
# 1793.png  0.0  0.0   6.0  16.0  13.0  11.0  ...  6.0  16.0  14.0   6.0  0.0  0.0
# 1794.png  0.0  0.0   1.0  11.0  15.0   1.0  ...  2.0   9.0  13.0   6.0  0.0  0.0
# 1795.png  0.0  0.0   2.0  10.0   7.0   0.0  ...  5.0  12.0  16.0  12.0  0.0  0.0
# 1796.png  0.0  0.0  10.0  14.0   8.0   1.0  ...  8.0  12.0  14.0  12.0  1.0  0.0

# Fit and transform data
results = cl.fit_transform(Xraw)

print(results['filenames'])
# array(['0.png', '1.png', '2.png', ..., '1794.png', '1795.png', '1796.png'],

@Sid01123
Copy link
Author

Sid01123 commented Jun 9, 2022 via email

@erdogant
Copy link
Owner

I am closing this one. Re-open this issue if required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants