diff --git a/docs/source/en/main_classes/agent.mdx b/docs/source/en/main_classes/agent.mdx
index ee910b893b6d0e..953857c410cbba 100644
--- a/docs/source/en/main_classes/agent.mdx
+++ b/docs/source/en/main_classes/agent.mdx
@@ -19,7 +19,7 @@ can vary as the APIs or underlying models are prone to change.
 
 </Tip>
 
-To learn more about agents and tools make sure to read the [introductory guide](../agents_and_tools). This page
+To learn more about agents and tools make sure to read the [introductory guide](../transformers_agents). This page
 contains the API docs for the underlying classes.
 
 ## Agents
diff --git a/docs/source/en/transformers_agents.mdx b/docs/source/en/transformers_agents.mdx
index 7cf1ce00b2aece..e388c3640ab6de 100644
--- a/docs/source/en/transformers_agents.mdx
+++ b/docs/source/en/transformers_agents.mdx
@@ -266,16 +266,16 @@ with the code generated by the agent.
 We identify a set of tools that can empower such agents. Here is an updated list of the tools we have integrated 
 in `transformers`:
 
-- **Document question answering**: given a document (such as a PDF) in image format, answer a question on this document ([Donut](../model_doc/donut))
-- **Text question answering**: given a long text and a question, answer the question in the text ([Flan-T5](../model_doc/flan-t5))
-- **Unconditional image captioning**: Caption the image! ([BLIP](../model_doc/blip))
-- **Image question answering**: given an image, answer a question on this image ([VILT](../model_doc/vilt))
-- **Image segmentation**: given an image and a prompt, output the segmentation mask of that prompt ([CLIPSeg](../model_doc/clipseg))
-- **Speech to text**: given an audio recording of a person talking, transcribe the speech into text ([Whisper](../model_doc/whisper))
-- **Text to speech**: convert text to speech ([SpeechT5](../model_doc/speecht5))
-- **Zero-shot text classification**: given a text and a list of labels, identify to which label the text corresponds the most ([BART](../model_doc/bart))
-- **Text summarization**: summarize a long text in one or a few sentences ([BART](../model_doc/bart))
-- **Translation**: translate the text into a given language ([NLLB](../model_doc/nllb))
+- **Document question answering**: given a document (such as a PDF) in image format, answer a question on this document ([Donut](./model_doc/donut))
+- **Text question answering**: given a long text and a question, answer the question in the text ([Flan-T5](./model_doc/flan-t5))
+- **Unconditional image captioning**: Caption the image! ([BLIP](./model_doc/blip))
+- **Image question answering**: given an image, answer a question on this image ([VILT](./model_doc/vilt))
+- **Image segmentation**: given an image and a prompt, output the segmentation mask of that prompt ([CLIPSeg](./model_doc/clipseg))
+- **Speech to text**: given an audio recording of a person talking, transcribe the speech into text ([Whisper](./model_doc/whisper))
+- **Text to speech**: convert text to speech ([SpeechT5](./model_doc/speecht5))
+- **Zero-shot text classification**: given a text and a list of labels, identify to which label the text corresponds the most ([BART](./model_doc/bart))
+- **Text summarization**: summarize a long text in one or a few sentences ([BART](./model_doc/bart))
+- **Translation**: translate the text into a given language ([NLLB](./model_doc/nllb))
 
 These tools have an integration in transformers, and can be used manually as well, for example: