Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

turning off crew ai telemetry #241

Closed
wijjj opened this issue Feb 9, 2024 · 13 comments
Closed

turning off crew ai telemetry #241

wijjj opened this issue Feb 9, 2024 · 13 comments

Comments

@wijjj
Copy link

wijjj commented Feb 9, 2024

Dear crew AI crew.

Considering crewai/telemtry/telemetry.py: Is it possible to turn this off? It throws a lot of errors in an air-gapped environment, and causes an unnecessary cascade of logging (due to the system sec). Using crewai 0.5.5 with ollama 0.1.24 (and python3.10 on Ubuntu 22.04) Here is at least what I get from running your example (using ollama):

Exception while exporting Span batch.
Traceback (most recent call last):
  File "/repo/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/repo/.venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 95, in create_connection
    raise err
  File "/repo/.venv/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
    sock.connect(sa)
TimeoutError: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/repo/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/repo/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 398, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/repo/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 239, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/lib/python3.10/http/client.py", line 1283, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1329, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/lib/python3.10/http/client.py", line 976, in send
    self.connect()
  File "/repo/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 205, in connect
    conn = self._new_conn()
  File "/repo/.venv/lib/python3.10/site-packages/urllib3/connection.py", line 179, in _new_conn
    raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPConnection object at 0x7f80230d2d10>, 'Connection to telemetry.crewai.com timed out. (connect timeout=10)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/repo/.venv/lib/python3.10/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/repo/.venv/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/repo/.venv/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='telemetry.crewai.com', port=4318): Max retries exceeded with url: /v1/traces (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f80230d2d10>, 'Connection to telemetry.crewai.com timed out. (connect timeout=10)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/repo/.venv/lib/python3.10/site-packages/opentelemetry/sdk/trace/export/__init__.py", line 368, in _export_batch
    self.span_exporter.export(self.spans_list[:idx])  # type: ignore
  File "/repo/.venv/lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 145, in export
    resp = self._export(serialized_data)
  File "/repo/.venv/lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/http/trace_exporter/__init__.py", line 114, in _export
    return self._session.post(
  File "/repo/.venv/lib/python3.10/site-packages/requests/sessions.py", line 635, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
  File "/repo/.venv/lib/python3.10/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/repo/.venv/lib/python3.10/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/repo/.venv/lib/python3.10/site-packages/requests/adapters.py", line 553, in send
    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='telemetry.crewai.com', port=4318): Max retries exceeded with url: /v1/traces (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f80230d2d10>, 'Connection to telemetry.crewai.com timed out. (connect timeout=10)'))

Obviously I can't post the connected logging that this is triggering. But I won't be able to create a rule for this as of now. So that's why I'm thinking it would be nicer if there's a way to turn this off. Otherwise I'm looking at patching this.

Thanks in advance.

@joaomdmoura
Copy link
Owner

Oh I see, I think I know what the problem might be! all right working on a fix now, will push to main soon and will cut a new version with it today

@edisonzf2020
Copy link

This problem still exists, has it been updated?

@joaomdmoura
Copy link
Owner

Almost finishing it should go live soon! I'm just wrapping up another feature

@wijjj
Copy link
Author

wijjj commented Feb 10, 2024

Oh I see, I think I know what the problem might be! all right working on a fix now, will push to main soon and will cut a new version with it today

Thanks mate! Great work by the way! :)

@StefanDanielSchwarz
Copy link

Hi @joaomdmoura,

I see you closed this as "Completed", but there's no reference to the changes you made, so what was the solution? Is telemetry now optional, can it be fully disabled?

While I understand why it's a useful feature for you, and I'd happily turn it on to help you focus on the most used features, there are some important disadvantages: Some data is more anonymous than other, and sending e. g. which model is used would be problematic if it's a self-made one that's unique to the user. Also mandatory telemetry would preclude using crewAI in certain places and especially here in Europe/Germany a mandatory or on-by-default telemetry is always a detriment to software's acceptance.

Of course we could just patch it out, but it would leave a much better impression if crewAI wouldn't be collecting such data by default, or at the very least, allow opting out. Hope you agree with that. Thanks for your consideration!

@joaomdmoura
Copy link
Owner

Hey @StefanDanielSchwarz,

Sorry to not close the loop in this one, I made a few changes in the last version around:

  • not having errors within telemetry to interfere with the execution
  • added an extra option for people to opt-in to share more sensitive data, it's an option on the crew instance now.

Focus is very much on only anonymous data so we don't get any PII as that is the biggest issues with legislation, so even IPs and such we won't get any near that.
About the model, as an example, we get the name and provider, goal being to enable us to test on those models as we ship something new and make sure they work great. If it is a closed proprietary model we have no means to know what it is, what data it was trained on, who used it or what it is about, even if it's specific for a user we wouldn't be able to tell that.

Let me give it some thought on next steps for this, this data is key to help us decide next steps so I wanna make sure I'm mindful about how we go about it, as there are a few options like maybe remove even more items from the default telemetry, whatever we decide to do about this in the next versions I'll make sure to share with the community.

@StefanDanielSchwarz
Copy link

Not sure about other devs or orgs, but it's common practice to include names with the models, so that alone pretty much makes it no longer anonymous – and the collection of that data without prior consent would be problematic. So I hope you'll reconsider and provide either a global opt-out option or a configurable list of which items to send. Actually the latter would be perfect as everyone could see what's sent and choose to remove items they consider confidential – in any case much better than having to adapt the code manually or being prohibited from using crewAI at all.

@joaomdmoura
Copy link
Owner

I'm still thinking about this, but a good idea someone gave me is to transform the models names into numbers, anonymizing them in a way, so if it is a public model we would be able to point it down, but if it's a proprietary one we would have no clue and not even have the name of it as it would be hashed and we wouldn't have that mapped out on our side given it's no public. Again still thinking more about it and other measures, but this feels like a good initial step, also being more clear on readme and docs on how we are using all the data.

@mxab
Copy link

mxab commented Feb 13, 2024

Hi,
first of all crew ai looks super cool, the telemetry makes me also a little bit uneasy.
There are things like the hostname:
self._add_attribute(span, "hostname", socket.gethostname())
That makes me "trusting" the anonymizing promise not really especially when using plain http (http://telemetry.crewai.com:4318)

Besides that I'm trying to run crewai by using rye and setuptools is not available, therefore the how telemetry modul fails as it cannot import pkg_resources

@joaomdmoura
Copy link
Owner

joaomdmoura commented Feb 13, 2024

Yup the hostname was def a bad choice, already being removed on the next release, working on moving it to https this week + anonymization I mention above

@joelr45
Copy link

joelr45 commented Feb 16, 2024

Agreed. Awesome tool. However, we won't be able to utilize this unless there is an option to disable telemetry fully for compliance reasons.

@anuradhawick
Copy link

@joelr45 did you figure out a way to turn if off? It is definitely a no no for us as well. Realised this quite late given MIT license.

@tralamazza
Copy link

tralamazza commented Mar 15, 2024

If anyone wants to try, I've added a telemetry config to the Crew, defaults to True https://github.com/tralamazza/crewAI/tree/disable-telemetry

demo: https://www.youtube.com/watch?v=iKYgUblzryc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants