Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deeplake Transform failed error #6

Closed
sai-krishna-msk opened this issue Apr 26, 2023 · 4 comments
Closed

Deeplake Transform failed error #6

sai-krishna-msk opened this issue Apr 26, 2023 · 4 comments

Comments

@sai-krishna-msk
Copy link

Following is the entire error thread

fatal: destination path './gumroad' already exists and is not an empty directory.
Created a chunk of size 1020, which is longer than the specified 1000
Created a chunk of size 1540, which is longer than the specified 1000
This dataset can be visualized in Jupyter Notebook by ds.visualize() or at https://app.activeloop.ai/sai13579/code_repo_qa2

hub://sai13579/code_repo_qa2 loaded successfully.

Deep Lake Dataset in hub://sai13579/code_repo_qa2 already exists, loading from the storage
Dataset(path='hub://sai13579/code_repo_qa2', tensors=[])

 tensor    htype    shape    dtype  compression
 -------  -------  -------  -------  -------
Evaluating ingest: 0%|                                                                                    | 0/1 [00:02<? 
Error in sys.excepthook:
Traceback (most recent call last):
  File "C:\Users\320117176\AppData\Local\anaconda3\envs\ai_agent\lib\site-packages\humbug\report.py", line 540, in _hook 
    self.error_report(error=exception_instance, tags=tags, publish=publish)
  File "C:\Users\320117176\AppData\Local\anaconda3\envs\ai_agent\lib\site-packages\humbug\report.py", line 274, in error_report
    traceback.format_exception(
TypeError: format_exception() got an unexpected keyword argument 'etype'

Original exception was:
Traceback (most recent call last):
  File "C:\Users\320117176\AppData\Local\anaconda3\envs\ai_agent\lib\site-packages\deeplake\core\transform\transform_tensor.py", line 117, in append
    raise TensorDoesNotExistError(self.name)
deeplake.util.exceptions.TensorDoesNotExistError: "Tensor 'text' does not exist."

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\320117176\AppData\Local\anaconda3\envs\ai_agent\lib\site-packages\deeplake\util\transform.py", line 207, in _transform_and_append_data_slice
    out = transform_sample(sample, pipeline, tensors)
  File "C:\Users\320117176\AppData\Local\anaconda3\envs\ai_agent\lib\site-packages\deeplake\util\transform.py", line 75, 
in transform_sample
    fn(out, result, *args, **kwargs)
  File "C:\Users\320117176\AppData\Local\anaconda3\envs\ai_agent\lib\site-packages\langchain\vectorstores\deeplake.py", line 219, in ingest
    sample_out.append(
  File "C:\Users\320117176\AppData\Local\anaconda3\envs\ai_agent\lib\site-packages\deeplake\core\transform\transform_dataset.py", line 67, in append
    self[k].append(v)
  File "C:\Users\320117176\AppData\Local\anaconda3\envs\ai_agent\lib\site-packages\deeplake\core\transform\transform_tensor.py", line 127, in append
    raise SampleAppendError(self.name, item) from e
deeplake.util.exceptions.SampleAppendError: Failed to append the sample [core]
        repositoryformatversion = 0
        filemode = false
        bare = false
        logallrefupdates = true
        symlinks = false
        ignorecase = true
[remote "origin"]
        url = https://github.com/sai-krishna-msk/VtopScrapper
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
        remote = origin
        merge = refs/heads/master to the tensor 'text'. See more details in the traceback.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\320117176\OneDrive - Philips\Documents\projects\ai_agent\Chat-with-Github-Repo\github.py", line 53, in <module>
    main(repo_url, root_dir, deeplake_repo_name, deeplake_username)
  File "c:\Users\320117176\OneDrive - Philips\Documents\projects\ai_agent\Chat-with-Github-Repo\github.py", line 44, in main
    db.add_documents(texts)
  File "C:\Users\320117176\AppData\Local\anaconda3\envs\ai_agent\lib\site-packages\langchain\vectorstores\base.py", line 
61, in add_documents
    return self.add_texts(texts, metadatas, **kwargs)
  File "C:\Users\320117176\AppData\Local\anaconda3\envs\ai_agent\lib\site-packages\langchain\vectorstores\deeplake.py", line 236, in add_texts
    ingest().eval(
  File "C:\Users\320117176\AppData\Local\anaconda3\envs\ai_agent\lib\site-packages\deeplake\core\transform\transform.py", line 99, in eval
    pipeline.eval(
  File "C:\Users\320117176\AppData\Local\anaconda3\envs\ai_agent\lib\site-packages\deeplake\core\transform\transform.py", line 298, in eval
    raise TransformError(
deeplake.util.exceptions.TransformError: Transform failed at index 0 of the input data on the item: [('[core]\n\trepositoryformatversion = 0\n\tfilemo...n\\HEAD'}, 'a217eccf-e42f-11ed-94dd-f47b099e160e')]. See traceback for more details. 
  • I have tried using different repositories and finally the error I uploaded, I was using this very repository
  • The exact error is at line 44, file: Github.py
db.add_documents(texts)

Can anyone please help me understand and resolve this issue

Thank you in advance 🙌

@FayazRahman
Copy link

Hey @sai-krishna-msk, it looks like your dataset has no tensors. You can create tensors using ds.create_tensor. Do tell me if you need more help!

@sai-krishna-msk
Copy link
Author

Hey @sai-krishna-msk, it looks like your dataset has no tensors. You can create tensors using ds.create_tensor. Do tell me if you need more help!

@FayazRahman , Thank you for swift response.

I'm sorry but i have never worked with deeplake package before, I am not aware of what the issue still is, Can you kindly tell me what i am missing(When you say my dataset does not have tensor, do you mean the GitHub repo i am working with has no code ?). If and when you have time can you please elaborate on that and also point me in the direction where i have to modify the code.

Your help is much appreciated

On a side note, I was able to make the code work,

So first I tried with my private repo's code(lets call it repo-1), It was throwing the error I specified above, So i tried to use another public repo(lets call it repo-2), but still it was not working, so i did some debugging and found out despite of me changing the URL to repo-2, The code was working with repo-1. but when i had deleted the gumroad directory(Which the code creates to store repo files) the code is now working with repo-2.

Keeping the bug aside, I am still trying to figure out why the code did not work with repo-1.

I will post an update if I found out.

But if anyone else figures out, please let me know. Thank you in advance.

@sanchitram1
Copy link
Contributor

Had a new script where I ran this, and it worked

import deeplake 
api_key = os.getenv("<deeplake_api>")

# create an empty "data store" on deeplake. overwrite=True so I could keep reusing it
ds = deeplake.empty('hub://<your organization from deeplake>/<whatever you want to call it>', token=api_key, overwrite=True)

# create tensors mimicking the output sample from github.py
ds.create_tensor("ids")
ds.create_tensor("metadata")
ds.create_tensor("embedding")
ds.create_tensor("text", htype="text")

IMO It's worth adding to the instructions, but I think what's going on here is that the github.py scripts outputs tensors in the following layout ['ids', 'metadata', 'embedding', 'text'], so you need to mimic that structure in your deeplake datastore.

@sai-krishna-msk
Copy link
Author

Thank you @sanchitram1, I think that should fix it.

I could not figure out the issue but based on error messages it was clear that it was deeplake issue, So I swapped out Deeplake as a vector database with Pinecone.

It is currently working with pinecone, which I found to be much simpler to work with as compared to Deeplake(although I am sure there are reasonable tradeoffs between Deeplake and Pinecone)

Here is the working code of the same project but with pinecone, Pinecone version of Chat-with-Github

Note
Hi @peterw , I have credited you in my repo, Please let me know if it is not suffice. I'll do the necessary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants