- 
                Notifications
    
You must be signed in to change notification settings  - Fork 942
 
[template] add_retry #6138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[template] add_retry #6138
Conversation
          Summary of ChangesHello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the reliability of multimodal data processing by implementing a retry mechanism for image URL fetching and standardizing the timeout configuration through a new  Highlights
 Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either  
 Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a  Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
本次 PR 主要在 vision_utils.py 中为从 URL 加载文件增加了重试机制,并使用了 requests.Session 来管理 HTTP 请求,提高了代码的健壮性。同时,将超时控制的环境变量从 TIMEOUT 更改为 SWIFT_TIMEOUT,并将默认值从 300 秒调整为 20 秒。相关的中英文文档也已同步更新。代码实现良好,我只在 vision_utils.py 中提出一个关于使用 with 语句来简化会话管理的小建议,以使代码更加简洁和 Pythonic。
        
          
                swift/llm/template/vision_utils.py
              
                Outdated
          
        
      | session = requests.Session() | ||
| session.mount("http://", HTTPAdapter(max_retries=retries)) | ||
| session.mount("https://", HTTPAdapter(max_retries=retries)) | ||
| 
               | 
          ||
| timeout = float(os.getenv('SWIFT_TIMEOUT', '20')) | ||
| request_kwargs = {'timeout': timeout} if timeout > 0 else {} | ||
| 
               | 
          ||
| try: | ||
| response = session.get(path, **request_kwargs) | ||
| response.raise_for_status() | ||
| content = response.content | ||
| res = BytesIO(content) | ||
| finally: | ||
| session.close() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为了使代码更简洁且符合 Python 风格,建议使用 with 语句来管理 requests.Session。with 语句可以自动处理会话的关闭,从而替代 try...finally 结构,使代码更具可读性。
            with requests.Session() as session:
                session.mount("http://", HTTPAdapter(max_retries=retries))
                session.mount("https://", HTTPAdapter(max_retries=retries))
                timeout = float(os.getenv('SWIFT_TIMEOUT', '20'))
                request_kwargs = {'timeout': timeout} if timeout > 0 else {}
                response = session.get(path, **request_kwargs)
                response.raise_for_status()
                content = response.content
                res = BytesIO(content)| 
           /gemini review  | 
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a retry mechanism for downloading images from URLs, making the process more robust against transient network failures. The implementation uses requests.Session with HTTPAdapter and urllib3.util.retry.Retry. The timeout handling has also been updated to use a new environment variable SWIFT_TIMEOUT with a default of 20 seconds. The documentation has been updated accordingly in both Chinese and English to reflect these changes.
My review includes suggestions to improve the efficiency of HTTP requests by reusing a requests.Session object, making the retry logic more configurable, and enhancing the clarity of the documentation.
        
          
                docs/source/Instruction/常见问题整理.md
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, it would be helpful to mention the default timeout value here, similar to how it was documented for the old TIMEOUT variable. This helps users understand the default behavior.
| 设置环境变量`SWIFT_TIMEOUT`。或者`InferClient`中可以传参数。 | |
| 设置环境变量`SWIFT_TIMEOUT`(默认为20秒)。或者`InferClient`中可以传参数。 | 
| 
               | 
          ||
| ### Q12: Can we set request timeout time for Swift3.0 deployment inference? What happens if the image URL is invalid? | ||
| You can set the `TIMEOUT` environment variable, which defaults to 300 seconds. Alternatively, you can pass parameters in `InferClient`. | ||
| You can set the `SWIFT_TIMEOUT` environment variable. Alternatively, you can pass parameters in `InferClient`. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, it would be helpful to mention the default timeout value here, similar to how it was documented for the old TIMEOUT variable. This helps users understand the default behavior.
| You can set the `SWIFT_TIMEOUT` environment variable. Alternatively, you can pass parameters in `InferClient`. | |
| You can set the `SWIFT_TIMEOUT` environment variable, which defaults to 20 seconds. Alternatively, you can pass parameters in `InferClient`. | 
| request_kwargs['timeout'] = timeout | ||
| content = requests.get(path, **request_kwargs).content | ||
| res = BytesIO(content) | ||
| retries = Retry(total=3, backoff_factor=1, allowed_methods=['GET']) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The retry parameters (total and backoff_factor) are hardcoded. It would be more flexible to allow configuring them via environment variables, similar to how SWIFT_TIMEOUT is handled. This would allow users to adjust the retry behavior based on their network conditions without changing the code.
| retries = Retry(total=3, backoff_factor=1, allowed_methods=['GET']) | |
| retries = Retry(total=int(os.getenv('SWIFT_RETRY_TOTAL', 3)), backoff_factor=float(os.getenv('SWIFT_RETRY_BACKOFF_FACTOR', 1)), allowed_methods=['GET']) | 
| with requests.Session() as session: | ||
| session.mount('http://', HTTPAdapter(max_retries=retries)) | ||
| session.mount('https://', HTTPAdapter(max_retries=retries)) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a new requests.Session for each file download is inefficient, especially when downloading multiple images in a batch. A session object should be reused across requests to leverage connection pooling, which can significantly improve performance.
It's recommended to create a single session object at the module level and reuse it for all HTTP requests.
For example:
# At module level
_session = requests.Session()
retries = Retry(total=3, backoff_factor=1, allowed_methods=['GET'])
adapter = HTTPAdapter(max_retries=retries)
_session.mount('http://', adapter)
_session.mount('https://', adapter)
def load_file(...):
    # ...
    if path.startswith('http'):
        # ...
        response = _session.get(path, **request_kwargs)
        # ...While this change is outside the current diff, it's a highly recommended practice for performance.
No description provided.