Do the text encoders vary between different clip models

when we load clip model,
eg 
model_1, preprocess = clip.load("RN50", device=device, jit=False)
model_2, preprocess = clip.load("ViT-B/16", device=device, jit=False)

Obviously, the image encoders in model_1 and model_2 are different(ResNet and ViT), 
how about the text encoder in these two models, are they also different?