### Q1. In Python 3.X, what are the names and functions of string object types?

Python has a set of built-in methods that you can use on strings.

Note: All string methods returns new values. They do not change the original string.


Method	---------------Description

capitalize()	-----------Converts the first character to upper case

casefold()	-----------Converts string into lower case

center()	------------Returns a centered string

count()	--------------Returns the number of times a specified value occurs in a string

encode()	-----------Returns an encoded version of the string

endswith()	-------------Returns true if the string ends with the specified value

expandtabs()	----------Sets the tab size of the string

find()	----------Searches the string for a specified value and returns the position of where it was found

format()	----------Formats specified values in a string

format_map()	---------Formats specified values in a string

index()	----------Searches the string for a specified value and returns the position of where it was found

isalnum()	-----------Returns True if all characters in the string are alphanumeric

isalpha()	--------Returns True if all characters in the string are in the alphabet

isascii()	-------Returns True if all characters in the string are ascii characters

isdecimal()	-------Returns True if all characters in the string are decimals

isdigit()	----------Returns True if all characters in the string are digits

isidentifier()	--------Returns True if the string is an identifier

islower()	--------Returns True if all characters in the string are lower case

isnumeric()	-------Returns True if all characters in the string are numeric

isprintable()	------Returns True if all characters in the string are printable

isspace()	-----Returns True if all characters in the string are whitespaces

istitle()	--------Returns True if the string follows the rules of a title

isupper()	------Returns True if all characters in the string are upper case

join()	--------Converts the elements of an iterable into a string

ljust()	----------Returns a left justified version of the string

lower()	-------Converts a string into lower case

lstrip()	--------Returns a left trim version of the string

maketrans()	--------Returns a translation table to be used in translations

partition()	----------Returns a tuple where the string is parted into three parts

replace()	---------Returns a string where a specified value is replaced with a specified value

rfind()	--------Searches the string for a specified value and returns the last position of where it was found

rindex()	---------Searches the string for a specified value and returns the last position of where it was found

rjust()	-------Returns a right justified version of the string

rpartition()	-------Returns a tuple where the string is parted into three parts

rsplit()	----------Splits the string at the specified separator, and returns a list

rstrip()	--------Returns a right trim version of the string

split()	--------Splits the string at the specified separator, and returns a list

splitlines()	-----------Splits the string at line breaks and returns a list

startswith()	--------Returns true if the string starts with the specified value

strip()	-------Returns a trimmed version of the string

swapcase()	-------Swaps cases, lower case becomes upper case and vice versa

title()	----------Converts the first character of each word to upper case

translate()	--------Returns a translated string

upper()	---------Converts a string into upper case

zfill()	--------Fills the string with a specified number of 0 values at the beginning


### Q2. How do the string forms in Python 3.X vary in terms of operations?

string — Common string operations

* String constants

The constants defined in this module are:

string.ascii_letters
The concatenation of the ascii_lowercase and ascii_uppercase constants described below. This value is not locale-dependent.

string.ascii_lowercase
The lowercase letters 'abcdefghijklmnopqrstuvwxyz'. This value is not locale-dependent and will not change.

string.ascii_uppercase
The uppercase letters 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. This value is not locale-dependent and will not change.

string.digits
The string '0123456789'.

string.hexdigits
The string '0123456789abcdefABCDEF'.

string.octdigits
The string '01234567'.

string.punctuation
String of ASCII characters which are considered punctuation characters in the C locale: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~.

string.printable
String of ASCII characters which are considered printable. This is a combination of digits, ascii_letters, punctuation, and whitespace.

string.whitespace
A string containing all ASCII characters that are considered whitespace. This includes the characters space, tab, linefeed, return, formfeed, and vertical tab.

### Q3. In 3.X, how do you put non-ASCII Unicode characters in a string?

Non-ASCII domains are called Internationalized Domain Names (IDNs). ... They are not confined to strictly ASCII characters. Examples of Non-ASCII Characters. Examples of non-ascii characters used in international domain extensions.

I have a string that looks like so:

In [7]:
s = '6Â 918Â 417Â 712'

The clear cut way to trim this string (as I understand Python) is simply to say the string is in a variable called s, we get:

In [8]:
s.replace('Â ', '')

'6918417712'

That should do the trick. But of course it complains that the non-ASCII character '\xc2' in file blabla.py is not encoded.

I never quite could understand how to switch between different encodings.

Here's the code, it really is just the same as above, but now it's in context. The file is saved as UTF-8 in notepad and has the following header:

In [None]:
f = urllib.urlopen(url)

soup = BeautifulSoup(f)

s = soup.find('div', {'id':'main_count'})

#making a print 's' here goes well. it shows 6Â 918Â 417Â 712

s.replace('Â ','')

save_main_count(s)

It gets no further than s.replace...

* How to remove non-ASCII characters in Python

Removing non-ASCII characters results in a string that only contains ASCII characters. For example, removing non-ASCII characters from "àa string withé fuünny charactersß" results in "a string with funny characters".

Call str.encode(encoding, errors) with encoding as "ASCII" and errors as "ignore" to return str without "ASCII" characters. Use str.decode() to encode str.

In [2]:
string_with_nonASCII = "àa string withé fuünny charactersß."

In [3]:
encoded_string = string_with_nonASCII.encode("ascii", "ignore")

In [4]:
decode_string = encoded_string.decode()

print(decode_string)

a string with funny characters.


### Q4. In Python 3.X, what are the key differences between text-mode and binary-mode files?

The major difference between these two is that a text file contains textual information in the form of alphabets, digits and special characters or symbols. On the other hand, a binary file contains bytes or a compiled version of a text file.

* Text Files


Text files are special subset of binary files that are used to store human readable characters as a rich text document or plain text document. Text files also store data in sequential bytes but bits in text file represents characters.

Text files are less prone to get corrupted as any undesired change may just show up once the file is opened and then can easily be removed.

Text files are of two types:


1.Plain text files: These files store End of Line (EOL) marker at the end of each line to represent line break and an End of File (EOF) at the end of the file to represent end of file.

2.Rich text files: These files also follow the same schema as the plain text files but may also store text related information like text colour, text style, font style etc.
Because of simple and standard format to store data, text files are one of the most used file formats for storing textual data and are supported in many applications.


* Binary File

Binary file are those typical files that store data in the form of sequence of bytes grouped into eight bits or sometimes sixteen bits. These bits represent custom data and such files can store multiple types of data (images, audio, text, etc) under a single file.

Binary file can have custom file formats and the developer, who designs these custom file formats, converts the information, to be stored, in bits and arranges these bits in binary file so that they are well understood by the supporting application and when needed, can easily be read by the supporting application.

One most common example of binary file is image file is .PNG or .JPG. If one tries open these files using a text editor then, he/she may get unrecognizable characters, but when opened using the supporting image viewer, the file will be shown as a single image. This is because the file is in binary format and contains data in the form of sequence of bytes. When the text editor tries to read these bytes and tries to convert bits into characters, they get undesired special characters and display it to the user.

Binary files also store file information like file name, file format, etc., which may be included in the file as header to the file and is visible even when the file is opened in a text editor.

Since binary files store data in sequential bytes, a small change in the file can corrupt the file and make it unreadable to the supporting application.

![image.png](attachment:image.png)

### Q5. How can you interpret a Unicode text file containing text encoded in a different encoding than your platform's default?

To process text effectively in Python 3, it’s necessary to learn at least a tiny amount about Unicode and text encodings:

1.Python 3 always stores text strings as sequences of Unicode code points. These are values in the range 0-0x10FFFF. They don’t always correspond directly to the characters you read on your screen, but that distinction doesn’t matter for most text manipulation tasks.

2.To store text as binary data, you must specify an encoding for that text.

3.The process of converting from a sequence of bytes (i.e. binary data) to a sequence of code points (i.e. text data) is decoding, while the reverse process is encoding.

4.For historical reasons, the most widely used encoding is ascii, which can only handle Unicode code points in the range 0-0x7F (i.e. ASCII is a 7-bit encoding).

5.There are a wide variety of ASCII compatible encodings, which ensure that any appearance of a valid ASCII value in the binary data refers to the corresponding ASCII character.

6.“utf-8” is becoming the preferred encoding for many applications, as it is an ASCII-compatible encoding that can encode any valid Unicode code point.

7.“latin-1” is another significant ASCII-compatible encoding, as it maps byte values directly to the first 256 Unicode code points. (Note that Windows has it’s own “latin-1” variant called cp1252, but, unlike the ISO “latin-1” implemented by the Python codec with that name, the Windows specific variant doesn’t map all 256 possible byte values)

8.There are also many ASCII incompatible encodings in widespread use, particularly in Asian countries (which had to devise their own solutions before the rise of Unicode) and on platforms such as Windows, Java and the .NET CLR, where many APIs accept text as UTF-16 encoded data.

9.The locale.getpreferredencoding() call reports the encoding that Python will use by default for most operations that require an encoding (e.g. reading in a text file without a specified encoding). This is designed to aid interoperability between Python and the host operating system, but can cause problems with interoperability between systems (if encoding issues are not managed consistently).

10.The sys.getfilesystemencoding() call reports the encoding that Python will use by default for most operations that both require an encoding and involve textual metadata in the filesystem (e.g. determining the results of os.listdir())

11.If you’re a native English speaker residing in an English speaking country (like me!) it’s tempting to think “but Python 2 works fine, why are you bothering me with all this Unicode malarkey?”. It’s worth trying to remember that we’re actually a minority on this planet and, for most people on Earth, ASCII and latin-1 can’t even handle their name, let alone any other text they might want to write or process in their native language.

### Q6. What is the best way to make a Unicode text file in a particular encoding format?

Writing unicode to a text file adds a line or multiple lines of unicode text to the file. UTF-8 is the most common unicode character encoding.

Call str.encode(encoding) with encoding set to "utf8" to encode str. Call open(file, mode) to open a file with mode set to "wb" . "wb" writes to files in binary mode and preserves UTF-8 format. Call file.write(data) to write data to the file.


In [None]:
unicode_text = u'ʑʒʓʔʕʗʘʙʚʛʜʝʞ'
encoded_unicode = unicode_text.encode("utf8")

a_file = open("textfile.txt", "wb")
a_file.write(encoded_unicode)

a_file = open("textfile.txt", "r")
#r reads contents of a file

contents = a_file.read()

print(contents)

### Q7. What qualifies ASCII text as a form of Unicode text?

The first 128 Unicode code points represent the ASCII characters, which means that any ASCII text is also a UTF-8 text. UCS-2 uses two bytes (16 bits) for each character but can only encode the first 65,536 code points, the so-called Basic Multilingual Plane (BMP).

ASCII defines 128 characters, which map to the numbers 0–127. Unicode defines (less than) 221 characters, which, similarly, map to numbers 0–221 (though not all numbers are currently assigned, and some are reserved).

When sending out your message, you have the option to choose between "TEXT" or "UNICODE" message encoding. With TEXT encoding, you can use all the most common characters in the alphabet. With UNICODE encoding, you can use special characters, like chinese, arabic, emoticons, ...

### Q8. How much of an effect does the change in string types in Python 3.X have on your code?

This article explains the new features in Python 3.0, compared to 2.6. Python 3.0, also known as “Python 3000” or “Py3K”, is the first ever intentionally backwards incompatible Python release. There are more changes than in a typical release, and more that are important for all Python users. Nevertheless, after digesting the changes, you’ll find that Python really hasn’t changed all that much – by and large, we’re mostly fixing well-known annoyances and warts, and removing a lot of old cruft.

This article doesn’t attempt to provide a complete specification of all new features, but instead tries to give a convenient overview. For full details, you should refer to the documentation for Python 3.0, and/or the many PEPs referenced in the text. If you want to understand the complete implementation and design rationale for a particular feature, PEPs usually have more details than the regular documentation; but note that PEPs usually are not kept up-to-date once a feature has been fully implemented.

Due to time constraints this document is not as complete as it should have been. As always for a new release, the Misc/NEWS file in the source distribution contains a wealth of detailed information about every small thing that was changed.