Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds new LabelledSimpleDataset (llama-dataset) #11805

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

nerdai
Copy link
Contributor

@nerdai nerdai commented Mar 10, 2024

Description

  • Adds LabelledSimpleDataset which contains of two fields: reference_label and text
  • This dataset can be used for classification, information extraction (like NER) tasks
  • (Needed for the new differential privacy pack I'm creating i.e., it will take in a LabelledSimpleDataset and create a privacy-safe synthetic version of it and also store it as a LabelledSimpleDataset)

Fixes # (issue)

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • No

Type of Change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update (TODO)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Added new notebook (that tests end-to-end) (TODO)
  • I stared at the code and made sure it makes sense

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 10, 2024
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 11, 2024
@logan-markewich logan-markewich merged commit c2272b5 into main Mar 11, 2024
8 checks passed
@logan-markewich logan-markewich deleted the nerdai/add-new-simple-llama-dataset branch March 11, 2024 16:38
bdonkey added a commit to bdonkey/gpt_index that referenced this pull request Mar 12, 2024
* main:
  add retries for openai LLM rate limit errors (run-llama#11867)
  Fix stream chat param error in CondenseQuestionChatEngine (run-llama#11856)
  Add logprobs field to `CompletionResponse` (run-llama#11855)
  Google readers integrations doc and drive updates (run-llama#11724)
  Adds llamafile support for local LLM and embeddings (run-llama#11709)
  Add QueryUnderstandingAgent LlamaPack (run-llama#11558)
  Split Google docs on headings (run-llama#11535)
  add baidu vectordb as vector store (run-llama#11494)
  [Feature Request]: add PowerPoint support to Confluence reader run-llama#10592 (run-llama#11454)
  add modelscope llm support (run-llama#11270)
  Add cohere command-R model (run-llama#11852)
  Fixed some gramatical mistakes  (run-llama#11840)
  fix obj insert_nodes (run-llama#11836)
  added table comment to Table Info in SQLDatabase (run-llama#11774)
  Fixes for add and delete Methods in LanceDBVectorStore (run-llama#11825)
  add support for open-mixtral-8x7b (run-llama#11792)
  Adds new LabelledSimpleDataset (llama-dataset) (run-llama#11805)
  Fix Cohere Embeddings issue run-llama#11820 (run-llama#11822)
  Update verifying connection (run-llama#11821)
  bump mistralai deps (run-llama#11819)
Izukimat pushed a commit to Izukimat/llama_index that referenced this pull request Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants