Strings are sequential collections of zero or more letters, numbers and other symbols. We call these letters, numbers and other symbols characters. Literal string values are differentiated from identifiers by using quotation marks (either single or double).

In [2]:
'Deepak'

'Deepak'

In [3]:
name = 'Deepak'

In [4]:
name[3]

'p'

In [5]:
len(name)

6

Since strings are sequences, all of the sequence operations described above work as you would expect. In addition, strings have a number of methods. For example,

In [6]:
name

'Deepak'

In [39]:
name.upper()

'DEEPAK'

#### Split a string

In [40]:
name.split('p')

['Dee', 'ak']

Of these, split will be very useful for processing data. split will take a string and return a list of strings using the split character as a division point. In the example, p is the division point. If no division is specified, the split method looks for whitespace characters such as tab, newline and space.

### Some String operations

#### Right, Left and Center justification

In [36]:
name.center(10) #Returns a string center-justified in a field of size w

'  Deepak  '

In [37]:
name.rjust(10) #Returns a string right-justified in a field of size w

'    Deepak'

In [38]:
name.ljust(10) #Returns a string left-justified in a field of size w

'Deepak    '

#### lower case

In [16]:
name_upper = 'DEEPAK'

In [18]:
name_upper.lower()

'deepak'

#### Find first occurence of a an item

In [30]:
name.find('p')

3

#### We can also search for other strings within our string

In [31]:
kid = 'Deep'

In [33]:
name.find(kid)

0

#### When item is not found, it returns a -1 index

In [34]:
name.find('d')

-1

## Interesting!!

A major difference between lists and strings is that lists can be modified while strings cannot. This is referred to as mutability. Lists are mutable; strings are immutable. For example, we can change an item in a list by using indexing and assignment. With a string that change is not allowed.

In [42]:
name[3] = 'r'

TypeError: 'str' object does not support item assignment

### String Comparisons
Comparing strings is one of the most important String operations to learn and something that we will do more often.

There are two types of comparissons in general in Python using "is" and "=="

In [31]:
a='this is a very long string'
b='this is a very long string'

In [57]:
# The first operation using is may or may not result in True based on where these items, 
# i.e., strings, are stored in memory.
a is b

False

In [68]:
# For equality check, we should actually be using "=="
a == b

True

#### Using id() to determine if the same objects in memory are referred to.

In [59]:
# Checking, id() shows them stored at different locations.
id(a)

2042491399152

In [60]:
id(b)

2042493478832

Another example showing that "is" can result in True:

In [61]:
'3' == '3'

True

In [62]:
a = 3
b = 3

In [63]:
a is b

True

In [64]:
a == b

True

Now that you have seen this, lets try larger numbers and check for equality again

In [65]:
a = 1211231232
b = 1211231232

In [66]:
a is b

False

In [67]:
a == b

True

### Interesting!!
Now that's strange!! Though they are referring to the same numbers in memory, the "is" check resulted in False. Why??

### Caching of small integers

Small values like 3 happen to be "cached" (an implementation detail!) in CPython in the hope of saving memory if they're used often; large values like 1211231232 are not so cached -- again, 100% an implementation detail. You'd never want to depend on such behavior!

Hence when a and b with large numbers are compared, it results in false when "is" is used cos they are stored in different locations but "==" behaves the way we expect 

#### Lets check that memory allocation thingy

In [47]:
a = 3
b = 3
a is b

True

In [48]:
id(a)

2012938800

In [49]:
id(b)

2012938800

#### Turns out "3" has been cached and its memory location is 2012938800

In [50]:
# We'll try it with bigger numbers now
a = 12311412415
b = 12311412415

In [51]:
a is b

False

In [52]:
id(a)

2042491398832

In [53]:
id(b)

2042491399280

#### So, as expected, the big numbers haven't been cached and are stored at different locations

In [54]:
## Lets look at an example
a = 19998989890
b = 19998989889 +1

In [55]:
a is b

False

In [56]:
a == b

True

is compares two objects in memory, (here the first object 19998989890 is different from the second which even though computes to the same value by the addition of one to the other number),

while

== compares their values. For example, (although their values might be the same, they are two different objects).

### Other examples:

In [70]:
import datetime
datetime.date.today() == datetime.date.today()

True

In [71]:
datetime.date.today is datetime.date.today()

False

Cos they are still two different datetime objects

In [85]:
A = [1, 2, 3, 4]
B = A[0:2]

In [86]:
id(A) == id(B)

False

In [87]:
id(A[0]) == id(B[0])

True

CPython caches small integers so any value from -5 to 256 will have same id anytime we check it.

When we do B = A[0:2], that ends up essentially doing this, as part of it: B[0] = A[0]. 
    
So the object (the integer 1) in A[0] is the same object which is in B[0].

CPython caches small integers. So we've got A[0] == 1 == B[0], and id(1) == id(1).

### The logic behind this

if x is y then x==y is also True
should never be read to mean

if x==y then x is y

### Interesting!!
We should use "==" when comparing values and "is" when comparing identities. (Also, from an English point of view, "equals" is different from "is".)

### For future reference:
Use == if you mean the objects should represent the same thing (most common usage) and "is" if you mean the objects should be in identical pieces of memory (you'd know if you needed the latter).

Also, you can overload == via the __eq__ operator, but you can't overload is.

### Summary of this:
is : used for identity testing (identical 'objects')

== : used for equality testing (~~ identical value)

### Some things to know about Memory allocation and CPython 
-- CPython is the reference implementation of the Python programming language. Written in C, CPython is the default and most widely used implementation of the language. CPython is an interpreter.
	
cpython allocates from a heap that gets scrambled up as objects are malloc'd and free'd. As we may see in different elements of one list getting stored at different locations in the memory that are not even close to each other or have an incremental nature to it.

Python numbers are not simple pieces of data. They are objects that use longs internally to begin with, then auto-promote to a BigNumber-style representation if the value gets too large. 

Just because we store (say) a 32bit int in a data structure in a scripting language doesn't mean we'll end using up 32bits of memory. There's ALWAYS metadata attached to ANY data we store such as type, size, length, et al.

Knowing Python and knowing a particular implementation (e.g. CPython) are two entirely different things. And even knowing CPython inside out won't help, as there are several memory managers CPython calls upon that aren't part of CPython but part of the respective operating system. 

In [83]:
id(a)

2042491398640

### So what does id() return??

It is "an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime." (Python Standard Library - Built-in Functions) A unique number. Nothing more, and nothing less. Think of it as a social-security number or employee id number for Python objects.

### Is it the same with memory addresses in C?

Conceptually, yes, in that they are both guaranteed to be unique in their universe during their lifetime. And in one particular implementation of Python (CPython), it actually is the memory address of the corresponding C object.

In [84]:
list = [1,2,3]
print('The ids are {},{},{}'.format(id(list[0]),id(list[1]),id(list[2])))

The ids are 2012938736,2012938768,2012938800


In [80]:
2012938736 - 2012938768

-32

In [81]:
2012938768 - 2012938800

-32

In this unique example we see that the the memory locations differ exactly by -32. But sometimes thats not the case. So the question is:

Why doesn't the number increase instantly by the size of the data type (Assuming that we are dealing with int 32 bit items) ?

Ans : Because a list is not an array, and an list element is a reference, not an object.

### Check the Implementation of Python we are using

In [82]:
import platform    
platform.python_implementation()

'CPython'

## Replace in String

#### replace()
The method replace() returns a copy of the string in which the occurrences of old have been replaced with new, optionally restricting the number of replacements to max.

In [2]:
str = "this is probably the most fun thing I have ever did. It literally is!!"
print(str.replace("is", "was"))
print(str.replace("is", "was", 1))

thwas was probably the most fun thing I have ever did. It literally was!!
thwas is probably the most fun thing I have ever did. It literally is!!


## Miscellaneous Operations

#### istitle()
The method istitle() checks whether all the case-based characters in the string following non-casebased letters are uppercase and all other case-based characters are lowercase.

In [4]:
str = "This Is A String Example...Wow!!!";
print(str.istitle())
str = "This is a string example....wow!!!";
print(str.istitle())

True
False
